Pinned Public Evidence

Code-retrieval quality and cost

1000 held-out queries across 4 public CoIR tasks, repeated 3 times. No private corpus or local path is included.

1000held-out queries

4public tasks

50languages

3repetitions

Aggregate results

Mode	nDCG@10	MRR@10	P@5	R@20	Warm p95	Index size
`neural`	0.2620	0.2178	0.0561	0.5080	220.14 ms	134.25 MiB

Population variance, phase timings, peak RSS, binary identity, dataset revisions, and checksums are retained in the raw JSON.

Change from frozen baseline

neural improves nDCG@10 by +12.77% and MRR@10 by +13.22% over hash at commit 49b1571de77a. The raw JSON retains every task and run.

Per-task quality

Task	Mode	nDCG@10	MRR@10	R@20
`codetrans-dl`	`neural`	0.2365	0.1454	0.7796
`codetrans-contest`	`neural`	0.4269	0.3921	0.5973
`cosqa`	`neural`	0.1464	0.1130	0.3413
`codefeedback-st`	`neural`	0.5243	0.4897	0.6566

Every retained task remains visible so aggregate improvements cannot hide regressions.

Claim boundary

This is a reproducible baseline, not a state-of-the-art claim. Exact-search tools are only compared on exact-query workloads; this matrix covers held-out natural-language and code-to-code retrieval. Neural and external embedding upper bounds use the same matrix when their profiles are evaluated.