Lane · Professional · Document Intelligence

Turning paper into data.

Half of the engineering knowledge at any service company is stuck inside PDFs — contracts, datasheets, commissioning reports, supplier invoices, field notes, manuals. The machine that reads them all is cheaper than it used to be, smarter than it has any right to be, and never complains about the handwriting.

DEMO

Synthetic documents. Sample files reference the invented Cryovox compressor, Helix drive, and Torquon motor from the Diagnostics page. Layouts, field structures, and extraction patterns mirror real-world practice — but no confidential customer contracts, proprietary datasheets, or actual invoices are reproduced here.

01 Discipline What document intelligence actually is

Structure from the unstructured.

A document is an argument the author made with layout and prose. A human reader reconstructs the argument in their head — this is the party name, this is the date, this clause modifies that one, this number is a total and that number is a line item. The reconstruction happens fast and unconsciously, which is why it feels like nothing until you have to do it on 200 invoices. Then it is everything.

Document intelligence is the discipline of getting a machine to do that reconstruction at volume, with enough ground-truth checks that a human only needs to audit the uncertain cases. Vision models do the reading. Structured-output schemas enforce the shape of the answer. Confidence scores tell you what to trust and what to double-check. The output is not a summary — it is a dataset.

Live demo Watch the extraction happen

02 Toolkit What's in the extraction bench

Model, schema, and a reviewer.

The extraction stack has three layers that have to work together. The vision-capable model does the actual reading — looking at the pixels, resolving the layout, parsing the prose. The structured-output schema enforces the shape of the answer, so the model can't hallucinate new fields or omit required ones. And a human reviewer audits the confidence-flagged cases before they enter the system of record. Skip any one layer and the output becomes unreliable; skip two and the whole pipeline is worse than doing the work by hand.

Vision

Claude · Sonnet 4.5

Reads PDFs, scanned images, and photographs natively. Understands tables, handles rotated pages, parses multilingual content without a separate OCR stage.

Schema

Structured JSON output

Every extraction job declares its required fields, optional fields, and validation constraints. The model fills the schema; it cannot invent new shape.

Confidence

Per-field scoring

Each extracted value carries a confidence score. Values below threshold get routed to human review; values above get written straight to the database.

Routing

Field-aware workflow

A total over €10k routes to approval; a new vendor routes to AP; a flagged lead-time routes to procurement. Extraction is the beginning, not the end.

Provenance

Source-linked fields

Every extracted value remembers where on which page it came from. Audit trail, legal defensibility, and fast human verification all come from this one discipline.

Reviewer

The human in the loop

Rare but essential. For unfamiliar document types, low-confidence fields, and values that touch money or safety, a human reviews before commit.

03 Process Six steps · end-to-end pipeline

From inbox to database row.

A good pipeline is a conveyor belt, not a crane lift. Each stage takes a standardised input and produces a standardised output, so the next stage can start without knowing what happened before. The whole thing runs on autopilot 95% of the time; the human only gets involved at the exceptions.

Ingest

File in

A PDF arrives from email, a supplier portal, a scanner, or a mobile photo. Normalise the format, compute a content hash, check for duplicates, store the original in the archive. Nothing else happens until the file is safely stored with a stable identifier.

Segment

Page-aware

Split multi-page files into logical sections — a 23-page contract might have cover / parties / terms / annex / signatures. Models process segments better than whole books, and segmentation is where a tricky document reveals its structure.

Extract

Schema-bound

Run the vision model against each segment with the target JSON schema in the prompt. The model returns structured values plus per-field confidence. For fields that span multiple pages (a total, a lead time buried on page 6), the extractor works across segments.

Validate

Rules engine

Cross-check extracted values against business rules. Does the VAT math add up? Does the line-item total match the bottom-line total? Is the vendor on the approved list? Validation catches both model errors and source-document errors.

Route

Field-aware

High-confidence, fully-validated extractions flow straight to the target system. Low-confidence or rule-flagged extractions route to a human reviewer with the source highlighted at the right page and span. No extraction is ever committed silently.

Store

Provenance preserved

The final record links back to the exact page and pixel region it came from. Six months later, when an auditor or a procurement manager asks "where does this number come from?", the answer is one click away. Provenance is what separates extraction from guessing.

04 Patterns Six extraction jobs from the real field

The jobs that actually earn the stack.

These are the extraction patterns that come up again and again in an industrial service business. Each one has its own schema, its own validation rules, and its own failure modes. Together they cover most of what the office does with paper on any given week.

Contract term extraction

Purchase agreements, NDAs, supplier frameworks. Pull parties, pricing, lead times, warranty, payment terms, governing law, liability caps. Red-flags anything unusual — long lead times, uncapped liability, auto-renewal clauses.

partiespricelead timewarrantypaymentliability capauto-renew

Datasheet spec parser

Turn a 40-page manufacturer PDF into a comparable spec row. Capacity, voltage, current, dimensions, weight, IP rating, operating envelope, connection types. Normalises units across vendors so real comparison becomes possible.

capacityvoltagecurrentdimsip ratingenvelopeconnections

Service report from notes + photos

The technician writes a few sentences, takes six photos, reads some parameters off the HMI. Model turns that into a structured service report: symptoms, measurements, root cause, parts replaced, recommendations, sign-off.

symptomsmeasurementsroot causepartshourssign-off

Parts-list reconciliation

Supplier invoice lines vs. BOM vs. project budget. Matches ambiguous part names ("MC-3 gasket set" = "Mycom MC-3 crankcase gasket kit"), flags price drift, catches missing items, catches duplicate billing.

part matchqty checkprice driftduplicatesmissing

Multilingual invoice intake

Incoming invoices in Danish, German, Italian, Romanian. Language-agnostic extraction — vendor name, number, dates, VAT, totals — normalised to a single schema. The AP team sees one format regardless of source.

vendorinvoice #datescurrencyvattotal

Handwritten field-note OCR

A phone photo of a technician's pocket-notebook page. Pulls out what was measured, what was replaced, what was recommended — and preserves ambiguity where the handwriting is genuinely unreadable, rather than guessing.

readingsactionsrecommendationsuncertain

05 Try it Pick a sample · simulated extraction

Drop in a document, watch it come apart.

This demo uses pre-computed synthetic extractions so the page works offline and preserves privacy — no API keys, no backend, no uploads sent anywhere. The structured outputs below are what a real Claude-powered pipeline produces on similar documents; the formatting, field shape, confidence scores, and red-flag logic all mirror production behaviour.

Sample document

3 samples · pre-loaded · live extraction disabled

Or drop a PDF here

Live API disabled on the public demo · upload is simulated

—

06 Featured case The contract that hid a 14-week lead time

"Page 6 of 23 said everything."

A standard supplier purchase agreement arrives in the office queue. Twenty-three pages, three signatories, English throughout, nothing visually alarming. The procurement lead runs it through extraction because that's the routine now — she's not expecting anything. The output comes back with a single amber flag: lead time, 14 weeks, confidence 71%, flagged for review.