Loading
Loading
OCR + AI extraction + DMS integration for a state court records office that needed paper archives turned into structured, queryable data.
Duration
6 months · still on retainer
Team
5 specialists
Practices
Digitalization
30,000
Files digitised
Days → mins
Retrieval time
9%
QA touch rate
100%
Chain of custody
A state court records office held 30,000 case files in physical archives. Retrieving a single document for a research request often took two to three working days. The records team needed paper turned into searchable archives with structured extraction of case metadata — while preserving the chain of custody for evidence-related material.
01
Surveyed the archive and identified the seven document classes — filings, judgments, exhibits, correspondence, transcripts, dockets, motions — that accounted for 92% of volume.
02
Built an OCR pipeline tuned for legal typography and handwritten annotations, with a fallback to a vision-language model for low-confidence pages.
03
Trained extractors for case number, filing date, document class, and parties — gated by a clerk-reviewable confidence threshold.
04
Wrote a thin integration layer that pushed structured records straight into the existing records management system without changing the front-of-house workflow.
05
Shipped a lightweight review console for the QA team to triage low-confidence pages and chain-of-custody edge cases.
“The pipeline has run continuously for two years with no incidents — and freed the records team to focus on research requests instead of paper hunting.”