Invoice Data Extractor

Parses extracted PDF text to find common invoice fields using Text.ParseText.RegexParse, Text.CropText.CropTextBetweenFlags, and Text.SplitText.SplitWithDelimiter. Assumes the input is already-extracted text from Pdf.ExtractTextFromPDF or OCR.

Sign in or create a free account to copy this script.

Problem this solves

Invoice processing is the #1 RPA use case globally; extracting structured data from unstructured PDF text is hard.

Usage Notes

  1. 1.Run patterns 1-6 sequentially per invoice. Pattern 7 wraps the batch.
  2. 2.DetectLayout: True on Pdf.ExtractTextFromPDF is critical for line item extraction.
  3. 3.Section markers need tuning — update FromFlag/ToFlag values to match your actual invoices.
  4. 4.Line item parsing is heuristic — splits on double-spaces for layout-preserved PDF text.
  5. 5.NOT FOUND sentinel values are set when a field can't be extracted — check downstream.

Dependencies

  • Regex Pattern Library
  • Multi-Line Text Block Parser