Invoice Data Extractor
Parses extracted PDF text to find common invoice fields using Text.ParseText.RegexParse, Text.CropText.CropTextBetweenFlags, and Text.SplitText.SplitWithDelimiter. Assumes the input is already-extracted text from Pdf.ExtractTextFromPDF or OCR.
Sign in or create a free account to copy this script.
Problem this solves
Invoice processing is the #1 RPA use case globally; extracting structured data from unstructured PDF text is hard.
Usage Notes
- 1.Run patterns 1-6 sequentially per invoice. Pattern 7 wraps the batch.
- 2.DetectLayout: True on Pdf.ExtractTextFromPDF is critical for line item extraction.
- 3.Section markers need tuning — update FromFlag/ToFlag values to match your actual invoices.
- 4.Line item parsing is heuristic — splits on double-spaces for layout-preserved PDF text.
- 5.NOT FOUND sentinel values are set when a field can't be extracted — check downstream.
Dependencies
- Regex Pattern Library
- Multi-Line Text Block Parser