← All Jobs
Posted Feb 13, 2026

Develop Existing NLP Legal Technology Software

Apply Now
Our software is a Python/Flask backend that identifies business-development-ready federal litigation opportunities from CourtListener and generates structured lead briefs and export bundles. The system detects high-signal discovery triggers (e.g., MTD denied, MTC granted, ESI protocol, 502(d), Rule 26(f)) from CourtListener HTML docket entries and PDF dockets (both are equally important), then enriches results via strict-JSON LLM summaries, scoring, and data-scope estimation. It uses SQLite caching and produces CSV/JSON/NDJSON/ZIP outputs. We’re looking for someone to materially improve trigger detection accuracy, with a strong emphasis on reducing false negatives (missed triggers) across both HTML docket text and PDF-extracted text/OCR. We already have a labeled dataset: ~9 discovery triggers ~100 labeled docket PDFs per trigger For each trigger: ~50 true positives + ~50 false-positive samples (ground truth labels) - Responsibilities: 1) Improve text extraction + normalization Audit PDF extraction quality and upgrade the hybrid pipeline (native text + OCR fallback). Normalize docket artifacts (pagination, headers/footers, spacing, line wraps, table-like formatting). Add extraction “confidence” signals to drive fallbacks and debugging. 2) Upgrade trigger detection (recall-forward) Strengthen existing regex/heuristic patterns to capture more true positives. Add context-sensitive logic (windowing, docket-entry boundaries, negative patterns, temporal phrasing). Implement disambiguation (e.g., “denied” vs “recommended denial”; “filed” vs “denied”; “granted in part”). Ensure improvements apply to both HTML and PDF sources. 3) Build a professional evaluation + error analysis loop Produce reproducible runs with per-trigger precision / recall / F1 and confusion breakdowns. Create “miss analysis” tooling: for each false negative, show why it didn’t fire + suggested rule updates. Add regression tests (pytest) to prevent future detection drift. 4) Integrate cleanly into the existing Flask codebase Contribute PRs with clear documentation and maintainable structure. Keep performance reasonable (avoid over-OCR; cache smartly; minimize repeated parsing). - Success criteria (what we will measure): Demonstrated improvement in recall (primary) and overall F1 on the labeled dataset. Reduced top false-negative root causes (extraction failures, phrasing variants, formatting artifacts). A maintainable trigger framework: easier to add triggers and safer to iterate. - Required experience: Strong Python backend engineering (clean code, tests, reproducibility). Prior experience with PDF parsing + OCR workflows (hybrid approaches strongly preferred). Experience building or tuning rule-based NLP / text classification systems with evaluation harnesses. Comfort with Flask + SQLite + pandas outputs + pytest. - Nice to have: Legal-tech / docket familiarity. Experience designing labeling workflows and active error mining. Experience with LLM structured outputs (strict JSON) + validation/normalization layers. - Expected Engagement: Contract / freelance, milestone-based preferred. Current listed budget is flexible, we are open to negotiation. Async-friendly. Potential extension after initial accuracy lift.