SaaS · AI Integration

AI Document Pipeline That
Cut Review Time by 80%

A Sydney legal tech startup was spending 4+ hours per case on manual document review. We built a RAG pipeline on their document corpus - with output guardrails so no hallucination ever reaches a client. Review time dropped to under 8 minutes.

⚖️ Sydney, AU · Legal Tech SaaS

📅 Timeline: 8 weeks

👤 Role: AI Integration Engineer

🔒 Full NDA · Client anonymous

01 · The Challenge

4 hours of manual review per case. At scale, that's a team problem.

The client ran a SaaS platform for small law firms - contract management, deadline tracking, client communication. Their power users were spending 4–5 hours per matter reviewing uploaded documents before they could start actual legal work. The founders saw AI as the obvious solution but had tried two ChatGPT wrappers that their users didn't trust.

The real problem wasn't the AI - it was the absence of guardrails. Lawyers can't use a tool that confidently produces wrong answers. They needed a system that knew what it didn't know, cited sources, and flagged uncertainty rather than hallucinating through it.

📄

Diverse document types

PDFs, Word docs, scanned contracts - some handwritten. The pipeline needed to handle all of them.

🚫

Zero hallucination tolerance

A wrong answer in a legal context is worse than no answer. The system had to know what it didn't know.

🔐

Client data sovereignty

Legal documents couldn't leave Australian infrastructure. No third-party storage of raw content.

⚙️

Existing product integration

Had to slot into the existing Next.js + PostgreSQL product without a full rewrite.

02 · The Approach

RAG with teeth - retrieval, grounding, and guardrails.

The architecture was a Retrieval-Augmented Generation pipeline: documents are ingested, chunked, embedded, and stored in a vector database. At query time, we retrieve only the relevant chunks, pass them as grounded context to the LLM, and return answers with source citations and a confidence score. Low confidence = flagged for human review, not delivered to the user.

AI Feasibility Audit (Week 1)

Before writing a line of code, I spent a week with their existing document corpus - 200+ real legal documents. Tested chunking strategies, embedding models, and retrieval accuracy on questions their users actually ask. Delivered a written report on what was achievable and what wasn't.

Ingestion Pipeline (Weeks 2–3)

Built a document processor that handles PDFs, DOCX, and scanned images (via AWS Textract for OCR). Documents are chunked using semantic splitters (not naive character splits), embedded using OpenAI text-embedding-3-large, and stored in pgvector on their existing RDS instance - no new infrastructure, data stays in-region.

RAG Query Layer + Guardrails (Weeks 4–6)

Built the retrieval and generation layer. Each query pulls the top-8 most relevant chunks, passes them as grounded context to GPT-4o with a strict system prompt. Output includes: the answer, source citations with page numbers, and a confidence score. Answers below 0.7 confidence are flagged for human review instead of displayed.

Integration & Rollout (Weeks 7–8)

Integrated the pipeline as a new tab in the existing matter view. Built streaming responses (feels instant). Ran a two-week beta with 12 power users, iterated on the prompt based on their feedback, then rolled to the full user base. Wrote a full architecture doc so their team can maintain and extend it.

03 · Tech Stack

Boring infrastructure, smart retrieval.

No new vector database vendor to pay for. No sending data offshore. Everything runs on their existing AWS stack.

OpenAI GPT-4otext-embedding-3-largepgvector (PostgreSQL)AWS Textract (OCR)AWS S3 (document storage)Next.js (API routes)LangChain.jsStreaming (Vercel AI SDK)

04 · Results

From 4 hours to 8 minutes. With citations.

The feature became the product's main selling point. Their trial-to-paid conversion rate increased 28% in the first month after launch. Zero complaints of hallucinated output in production - the guardrail system worked.

80%

Reduction in document review time - from 4+ hours to under 8 minutes per matter

28%

Increase in trial-to-paid conversion in the first month after feature launch

Hallucinated answers delivered to users - the guardrail system flagged every uncertain output

200+

Legal documents used to tune chunking, retrieval, and prompt strategy before a single user saw it

“

We'd tried two “AI for law firms” tools and our users didn't trust either. Ammar started with an audit - he told us exactly what was and wasn't possible before we committed to building anything. The confidence scoring feature was his idea, and it's now the thing our sales team leads with.

David P.

CTO, Legal Tech SaaS (Sydney, name withheld per NDA)

Want AI that actually works in your product?

Start with a paid audit. I'll tell you what's achievable before you commit to building.

Book an AI Audit