Freeze a 200-row challenge
We create a reproducible benchmark from real Illinois opinion excerpts and lock it to a benchmark id.
Public legal citation challenge
We give you 200 real Illinois opinion excerpts with one citation removed. Your system fills in the blank. LawEngine scores the answers against a private grading key.
How it works
We generate a frozen Illinois challenge set, you run your system locally, and we score your predictions against a private grading key.
We create a reproducible benchmark from real Illinois opinion excerpts and lock it to a benchmark id.
Run your system locally and write one CSV column: predicted_citation.
Upload your answers and get accuracy, confidence interval, and family-level breakdowns.
Active challenge workspace
Generate is the default path. Resume stays available when you already have a benchmark id to reopen.
Use an explicit seed so the frozen set is reproducible. The current production pool version is dcv-pool-v1.
Enter an integer from 1 to 9999. If you keep the same seed, repeated generation resolves to the same frozen benchmark.
Keep or share a benchmark id when you want to reopen the same frozen set.
Generate a challenge to unlock the benchmark summary, downloads, and blind scoring workflow.
Generate a challenge to unlock the benchmark summary, downloads, and blind scoring workflow.
Generate or resume a challenge to view the benchmark family mix.
Use the public challenge files locally, then come back with a prediction CSV for blind scoring.
Upload a CSV with exactly id,predicted_citation. Validation runs before the score request is sent.
Results
Top-line accuracy, confidence interval, and per-family scoring stay attached to the active benchmark id.
Upload a prediction CSV after you generate or resume a challenge. This section will show the blind score for that frozen set.
Challenge download CSV columns:
id,query,case_name,year,court,family,authority_type,source_urlPrediction upload CSV header:
id,predicted_citationGenerate or resume a benchmark to populate live metadata for this section.
Each benchmark is a frozen 200-row Illinois challenge set. Public downloads contain only the prompts and metadata needed to run the challenge locally.
The grading key stays server-side. Scores are resolved against the active benchmark id and its manifest hash so submissions are always tied to one frozen set.
The score response reports a 95% Wilson confidence interval from the backend. It is a range that helps interpret accuracy at the current sample size; the page does not estimate or modify that statistic independently.
The page stores the last benchmark id in localStorage and also persists it to the URL as ?benchmark_id=....
Sharing a benchmark URL lets another person open the same frozen challenge set, as long as they keep the benchmark id intact.