Pinned Public Evidence

Code-retrieval quality and cost

1000 held-out queries across 4 public CoIR tasks, repeated 3 times. No private corpus or local path is included.

1000held-out queries
4public tasks
50languages
3repetitions

Aggregate results

ModenDCG@10MRR@10P@5R@20Warm p95Index size
neural0.26200.21780.05610.5080220.14 ms134.25 MiB

Population variance, phase timings, peak RSS, binary identity, dataset revisions, and checksums are retained in the raw JSON.

Change from frozen baseline

neural improves nDCG@10 by +12.77% and MRR@10 by +13.22% over hash at commit 49b1571de77a. The raw JSON retains every task and run.

Per-task quality

TaskModenDCG@10MRR@10R@20
codetrans-dlneural0.23650.14540.7796
codetrans-contestneural0.42690.39210.5973
cosqaneural0.14640.11300.3413
codefeedback-stneural0.52430.48970.6566

Every retained task remains visible so aggregate improvements cannot hide regressions.

Claim boundary

This is a reproducible baseline, not a state-of-the-art claim. Exact-search tools are only compared on exact-query workloads; this matrix covers held-out natural-language and code-to-code retrieval. Neural and external embedding upper bounds use the same matrix when their profiles are evaluated.