A 30,000-file records pipeline for a state court system.

OCR + AI extraction + DMS integration for a state court records office that needed paper archives turned into structured, queryable data.

Duration

6 months · still on retainer

Team

5 specialists

Practices

Digitalization

What shipped, in numbers.

30,000

Files digitised

Days → mins

Retrieval time

9%

QA touch rate

100%

Chain of custody

The challenge.

A state court records office held 30,000 case files in physical archives. Retrieving a single document for a research request often took two to three working days. The records team needed paper turned into searchable archives with structured extraction of case metadata — while preserving the chain of custody for evidence-related material.

Engineering lead
Backend engineer
ML engineer
Records SME
QA

How we approached it.

01

Surveyed the archive and identified the seven document classes — filings, judgments, exhibits, correspondence, transcripts, dockets, motions — that accounted for 92% of volume.

02

Built an OCR pipeline tuned for legal typography and handwritten annotations, with a fallback to a vision-language model for low-confidence pages.

03

Trained extractors for case number, filing date, document class, and parties — gated by a clerk-reviewable confidence threshold.

04

Wrote a thin integration layer that pushed structured records straight into the existing records management system without changing the front-of-house workflow.

05

Shipped a lightweight review console for the QA team to triage low-confidence pages and chain-of-custody edge cases.

“The pipeline has run continuously for two years with no incidents — and freed the records team to focus on research requests instead of paper hunting.”

Practices that ran this brief.

Digitalization

Bulk scanning, OCR, and indexing — paper archives turned into searchable, structured digital assets.

Read the practice

More from the studio.

A year of typesetting, across every imprint we serve.

10M+ pages shipped across 12+ markets.

Read the case

A 12,000-document accessibility remediation programme for a university network.

WCAG 2.2 AA-compliant in 14 weeks.

Read the case