Code-retrieval quality and cost
1000 held-out queries across 4 public CoIR tasks, repeated 3 times. No private corpus or local path is included.
Aggregate results
| Mode | nDCG@10 | MRR@10 | P@5 | R@20 | Warm p95 | Index size |
|---|---|---|---|---|---|---|
neural | 0.2620 | 0.2178 | 0.0561 | 0.5080 | 220.14 ms | 134.25 MiB |
Population variance, phase timings, peak RSS, binary identity, dataset revisions, and checksums are retained in the raw JSON.
Change from frozen baseline
neural improves nDCG@10 by +12.77% and MRR@10 by +13.22% over hash at commit 49b1571de77a. The raw JSON retains every task and run.
Per-task quality
| Task | Mode | nDCG@10 | MRR@10 | R@20 |
|---|---|---|---|---|
codetrans-dl | neural | 0.2365 | 0.1454 | 0.7796 |
codetrans-contest | neural | 0.4269 | 0.3921 | 0.5973 |
cosqa | neural | 0.1464 | 0.1130 | 0.3413 |
codefeedback-st | neural | 0.5243 | 0.4897 | 0.6566 |
Every retained task remains visible so aggregate improvements cannot hide regressions.
Claim boundary
This is a reproducible baseline, not a state-of-the-art claim. Exact-search tools are only compared on exact-query workloads; this matrix covers held-out natural-language and code-to-code retrieval. Neural and external embedding upper bounds use the same matrix when their profiles are evaluated.