Public legal citation challenge

Can your AI recover the right legal citation from context alone?

We give you 200 real Illinois opinion excerpts with one citation removed. Your system fills in the blank. LawEngine scores the answers against a private grading key.

200 masked excerptsBlind server-side graderReproducible seedPer-family scoring

One benchmark session. One blind grade.

We generate a frozen Illinois challenge set, you run your system locally, and we score your predictions against a private grading key.

Generate

Freeze a 200-row challenge

We create a reproducible benchmark from real Illinois opinion excerpts and lock it to a benchmark id.

Predict

Recover the missing citation

Run your system locally and write one CSV column: predicted_citation.

Score

See top-line and per-family results

Upload your answers and get accuracy, confidence interval, and family-level breakdowns.

Generate, download, and score in one place.

Generate is the default path. Resume stays available when you already have a benchmark id to reopen.

Start your challenge

Generate a fresh 200-row Illinois challenge.

Use an explicit seed so the frozen set is reproducible. The current production pool version is dcv-pool-v1.

Enter an integer from 1 to 9999. If you keep the same seed, repeated generation resolves to the same frozen benchmark.

Keep or share a benchmark id when you want to reopen the same frozen set.

Your challenge

No benchmark yet

Generate a challenge to unlock the benchmark summary, downloads, and blind scoring workflow.

200 rowsdcv-pool-v1

Generate a challenge to unlock the benchmark summary, downloads, and blind scoring workflow.

Family mix

See the composition of this frozen challenge.

Generate or resume a challenge to view the benchmark family mix.

Download challenge files

Download the challenge bundle.

Use the public challenge files locally, then come back with a prediction CSV for blind scoring.

Score your predictions

Upload your prediction CSV and score it.

Upload a CSV with exactly id,predicted_citation. Validation runs before the score request is sent.

Required header: id,predicted_citation

See your results

Top-line accuracy, confidence interval, and per-family scoring stay attached to the active benchmark id.

No score yet

Your benchmark results will appear here.

Upload a prediction CSV after you generate or resume a challenge. This section will show the blind score for that frozen set.

Accuracy--
95% CI--
Correct count--
Claims scored--
Exact CSV formats

Challenge download CSV columns:

id,query,case_name,year,court,family,authority_type,source_url

Prediction upload CSV header:

id,predicted_citation
Current benchmark metadata

Generate or resume a benchmark to populate live metadata for this section.

How the blind benchmark works

Each benchmark is a frozen 200-row Illinois challenge set. Public downloads contain only the prompts and metadata needed to run the challenge locally.

The grading key stays server-side. Scores are resolved against the active benchmark id and its manifest hash so submissions are always tied to one frozen set.

What the confidence interval means

The score response reports a 95% Wilson confidence interval from the backend. It is a range that helps interpret accuracy at the current sample size; the page does not estimate or modify that statistic independently.

Resume and sharing notes

The page stores the last benchmark id in localStorage and also persists it to the URL as ?benchmark_id=....

Sharing a benchmark URL lets another person open the same frozen challenge set, as long as they keep the benchmark id intact.