10,847Invoices processed
300hManual work saved
โ‚ฌ42KMissed deductions found

When Rousseau & Partners, a Lyon-based accounting firm with 12 employees, came to us in October 2024, they had a problem that thousands of businesses share: a decade of paper invoices โ€” 10,847 of them โ€” sitting in filing cabinets, completely unsearchable.

Their CFO, Marc Rousseau, described the situation bluntly: "Every time a client calls asking about a vendor payment from 2019, one of my team members spends 45 minutes physically searching through binders. It's an embarrassment and a cost."

By Monday morning, those invoices were a fully structured, searchable Excel database. Here's exactly how we did it.

The Problem With Paper Invoices

Paper and PDF invoices represent one of the most common โ€” and most costly โ€” data silos in business. The average SME has between 2,000 and 50,000 invoices in non-searchable formats. The cost shows up in multiple ways:

๐Ÿ’ก Key insight from this project

When we structured Rousseau & Partners' invoices, their accountant identified 847 invoices that had never been properly categorized โ€” resulting in โ‚ฌ42,000 in legitimate deductions they were able to claim in their annual filing.

Step 1: Document Collection & Assessment

The first step was understanding what we were dealing with. Rousseau & Partners had three types of invoice documents:

All files were shared with us via a secure Google Drive folder โ€” 23GB of PDFs organized by year. Our secure portal ingests from Google Drive, Dropbox, WeTransfer, or direct upload.

Step 2: AI Processing Pipeline

Our processing pipeline for invoice extraction runs in four stages:

  1. Document classification โ€” AI sorts invoices from receipts, credit notes, and purchase orders automatically
  2. Field extraction โ€” OCR + GPT-4 Vision identifies vendor name, invoice number, date, line items, subtotal, VAT amount, and total
  3. Data normalization โ€” Dates standardized to ISO format, currency symbols unified, vendor names deduplicated
  4. Quality assurance โ€” A human reviewer spot-checks 5% of records and flags anomalies for re-processing

Step 3: Output Structure

For Rousseau & Partners, we delivered a structured Excel workbook with four sheets:

The Results

The processing ran from Friday evening to Sunday morning โ€” approximately 36 hours including QA. The final accuracy rate was 98.9% for digital PDFs and 96.4% for scanned documents.

Marc Rousseau's response after delivery: "Our accountant spent Monday morning in the spreadsheet and found โ‚ฌ42,000 in previously uncategorized deductible expenses. The service paid for itself before lunch."

How to Get This Done For Your Business

If you have invoices, contracts, or any document archive that's sitting unstructured, the process is simple:

  1. Choose a plan on OnyorAI that matches your page volume
  2. Upload your files via our secure portal or share a cloud folder link
  3. Receive your structured Excel, CSV, or Airtable database within your plan's turnaround time
  4. Start searching, filtering, and analyzing your data immediately
๐Ÿš€ Ready to digitize your document archive?

Our Starter plans start from โ‚ฌ25.99 for up to 50 pages. Enterprise clients with 50,000+ invoices contact us for a custom quote. 30-day money-back guarantee on every order.