SaaS · AI Integration
A Sydney legal tech startup was spending 4+ hours per case on manual document review. We built a RAG pipeline on their document corpus - with output guardrails so no hallucination ever reaches a client. Review time dropped to under 8 minutes.
01 · The Challenge
The client ran a SaaS platform for small law firms - contract management, deadline tracking, client communication. Their power users were spending 4–5 hours per matter reviewing uploaded documents before they could start actual legal work. The founders saw AI as the obvious solution but had tried two ChatGPT wrappers that their users didn't trust.
The real problem wasn't the AI - it was the absence of guardrails. Lawyers can't use a tool that confidently produces wrong answers. They needed a system that knew what it didn't know, cited sources, and flagged uncertainty rather than hallucinating through it.
PDFs, Word docs, scanned contracts - some handwritten. The pipeline needed to handle all of them.
A wrong answer in a legal context is worse than no answer. The system had to know what it didn't know.
Legal documents couldn't leave Australian infrastructure. No third-party storage of raw content.
Had to slot into the existing Next.js + PostgreSQL product without a full rewrite.
02 · The Approach
The architecture was a Retrieval-Augmented Generation pipeline: documents are ingested, chunked, embedded, and stored in a vector database. At query time, we retrieve only the relevant chunks, pass them as grounded context to the LLM, and return answers with source citations and a confidence score. Low confidence = flagged for human review, not delivered to the user.
Before writing a line of code, I spent a week with their existing document corpus - 200+ real legal documents. Tested chunking strategies, embedding models, and retrieval accuracy on questions their users actually ask. Delivered a written report on what was achievable and what wasn't.
Built a document processor that handles PDFs, DOCX, and scanned images (via AWS Textract for OCR). Documents are chunked using semantic splitters (not naive character splits), embedded using OpenAI text-embedding-3-large, and stored in pgvector on their existing RDS instance - no new infrastructure, data stays in-region.
Built the retrieval and generation layer. Each query pulls the top-8 most relevant chunks, passes them as grounded context to GPT-4o with a strict system prompt. Output includes: the answer, source citations with page numbers, and a confidence score. Answers below 0.7 confidence are flagged for human review instead of displayed.
Integrated the pipeline as a new tab in the existing matter view. Built streaming responses (feels instant). Ran a two-week beta with 12 power users, iterated on the prompt based on their feedback, then rolled to the full user base. Wrote a full architecture doc so their team can maintain and extend it.
03 · Tech Stack
No new vector database vendor to pay for. No sending data offshore. Everything runs on their existing AWS stack.
04 · Results
The feature became the product's main selling point. Their trial-to-paid conversion rate increased 28% in the first month after launch. Zero complaints of hallucinated output in production - the guardrail system worked.
We'd tried two “AI for law firms” tools and our users didn't trust either. Ammar started with an audit - he told us exactly what was and wasn't possible before we committed to building anything. The confidence scoring feature was his idea, and it's now the thing our sales team leads with.
Start with a paid audit. I'll tell you what's achievable before you commit to building.
Book an AI Audit