Warm p95
3.57x
53.77 ms -> 15.07 ms
Paired query latency, controlled indexing, full-system paths, and pinned public retrieval quality.
53.77 ms -> 15.07 ms
109006 chunks/s
1.06 GiB -> 0.46 GiB
Latency, quality, footprint, recall, and documented indexing ceiling.
| Path | Baseline p95 ms | Current p95 ms | Ratio |
|---|---|---|---|
| Process cold | 307.22 | 478.02 | 1.556 |
| Warm distinct | 112.57 | 58.29 | 0.518 |
| Cache replay | 27.85 | 31.96 | 1.148 |
| Filtered | 257.31 | 45.22 | 0.176 |
| Warm CLI | 151.27 | 138.99 | 0.919 |
| Concurrent | 164.74 | 99.17 | 0.602 |
| Metric | Baseline | Current | Delta |
|---|---|---|---|
| ndcg_at_10 | 0.2620 +/- 0.0002 | 0.2666 +/- 0.0016 | +0.0046 |
| mrr_at_10 | 0.2178 +/- 0.0004 | 0.2220 +/- 0.0014 | +0.0042 |
| precision_at_5 | 0.0561 +/- 0.0002 | 0.0601 +/- 0.0015 | +0.0039 |
| recall_at_20 | 0.5080 +/- 0.0024 | 0.4890 +/- 0.0008 | -0.0190 |
| no_hit_rate | 0.0000 +/- 0.0000 | 0.0000 +/- 0.0000 | +0.0000 |
| Dataset | Baseline | Current | Delta |
|---|---|---|---|
| codetrans-dl | 0.2365 +/- 0.0005 | 0.2402 +/- 0.0033 | +0.0037 |
| codetrans-contest | 0.4269 +/- 0.0000 | 0.4278 +/- 0.0001 | +0.0009 |
| cosqa | 0.1464 +/- 0.0003 | 0.1566 +/- 0.0041 | +0.0102 |
| codefeedback-st | 0.5243 +/- 0.0000 | 0.5108 +/- 0.0000 | -0.0135 |
The query run kept both daemons live and alternated request order. The controlled indexing run used the same generated corpus for both binaries. The exact full-system run is retained separately because the shared host was heavily saturated.