Adobe PDF Form Data Extraction to Dataverse
When a filled PDF form arrives (email or SharePoint), the flow extracts the form-field values with Adobe PDF Services, writes a structured record to Dataverse, and posts low-confidence or unmapped fields to Teams for review. Turns submitted PDF forms into clean structured data without re-keying.
Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.
Overview
Status: BUILT (Flow Checker 0 errors / 0 warnings, ships Off). This flow turns filled PDF forms into clean, structured Dataverse records. When a completed PDF form lands in a SharePoint library, the flow extracts its text with Adobe PDF Services (Extract REST API), parses the configured fields, writes a structured submission record to Dataverse, and posts any unmapped / low-confidence fields to a Microsoft Teams review channel.
Why it matters: Re-keying submitted PDF forms (applications, intake, claims) is slow and error-prone. Automated extraction creates clean records and a built-in exception lane for anything that doesn't map cleanly — no manual transcription.
Use Case
Operations and Finance teams that receive completed PDF forms want the data captured automatically into Dataverse, with a human-review path for ambiguous values. A filled PDF arriving in the inbox library is detected, its fields are read by Adobe, and a governed record is created. Fields that can't be located are flagged to a Teams channel so a person can complete them — nothing is silently dropped.
Flow Architecture
When a PDF Form Arrives
SharePoint — GetOnNewFileItems (poll + splitOn)Fires per new PDF in the configured inbox library, polling at 5-minute intervals.
Initialize variables
Initialize VariableCorrelation id (@guid()), site URL, Adobe base, work folder path, status/uris, struct-json, missing-fields and mapped-pairs arrays.
Get File Content
SharePoint — GetFileContentRead the new PDF bytes.
Adobe token + asset + upload
HTTP — POST /token, POST /assets, PUT (upload)Get an OAuth token, create an upload asset, and PUT the PDF bytes to Adobe.
Start Extract + Poll Until Complete
HTTP — POST /operation/extractpdf + Until loopStart the Extract job ({elementsToExtract:["text"]}) and poll the Location URL until done/failed; capture the result ZIP downloadUri.
Download + Save + Unzip Result
HTTP GET + SharePoint CreateFile + ExtractFolderV2Download the presigned result ZIP, save it to the work library, and unzip it into a per-submission folder.
Read + Flatten Structured Data
SharePoint GetFileContentByPath + Compose/SelectRead structuredData.json, isolate the elements[] text array, and flatten it into a readable string.
Apply to each Field
Foreach (sequential) + FilterFor each {label,column} in the field-map env var, match the elements; append {label,column,value} to mapped pairs or the label to missing fields.
Create Submission Record
Environment Variables
| Schema name | Type | Default | Description |
|---|---|---|---|
| flowlibs_SharePointSiteURL | String | https://your-tenant.sharepoint.com | Site hosting the libraries (reused). |
| flowlibs_FormInboxLibraryId | String | <configure> | Library the trigger watches for new PDFs. |
| flowlibs_FormWorkFolderPath | String | /FlowLibs PDF Form Work | Work library path for the ZIP + extracted files. |
| flowlibs_FormTableName | String | flowlibs_pdfformsubmissions | Target Dataverse entity set. |
| flowlibs_FormFieldMap | String | [{"label":"Full Name","column":"flowlibs_fullname"},{"label":"Email","column":"flowlibs_email"},{"label":"Date","column":"flowlibs_submitteddate"},{"label":"Amount","column":"flowlibs_amount"},{"label":"Signature","column":"flowlibs_signature"}] | PDF field label -> Dataverse column map. |
| flowlibs_AdobePdfServicesBase | String | https://pdf-services.adobe.io | Adobe REST base (reused). |
| flowlibs_AdobeClientId | String | <configure> | Adobe client id / X-API-Key (reused). |
| flowlibs_AdobeClientSecret | String | <configure> | Adobe client secret (reused). |
| flowlibs_AdobePollIntervalSeconds |
Connectors & Connections
| Connector | API name | Actions used |
|---|---|---|
| HTTP | http | POST /token POST /assets PUT (upload) POST /operation/extractpdf GET (status/download) |
| SharePoint | shared_sharepointonline | GetOnNewFileItems GetFileContent CreateFile ExtractFolderV2 GetFileContentByPath |
| Microsoft Dataverse | shared_commondataserviceforapps | CreateRecord |
| Microsoft Teams |
Customization Guide
Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.
- Validation
- Require key fields (e.g. Email, Amount) before creating the record; route incomplete forms entirely to review.
- Approval
- Branch high-value submissions through an Approvals step before committing.
- Confidence routing
- Extend the missing-fields logic to also flag values that match a label but fail a format check (regex on email/amount/date).
- Tables/structure
- Add elementsToExtract:["text","tables"] to capture tabular form data.
- Per-field columns
- Extend the table with real typed columns and write mapped values directly instead of the JSON blob.
Key Expressions
The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.
EXPR.01Result ZIP URL (Extract)
Resolve the presigned result ZIP URL from the Extract job.
EXPR.02Decode structuredData.json
Decode the extracted structured JSON.
EXPR.03Isolate elements
Pull the elements[] array from the structured data.
EXPR.04Field match (Filter)
Case-insensitive match of a field label against extracted text.
EXPR.05Status
Set the record status based on missing fields.
Customize & download
Generate a ready-to-import copy of this solution with your environment-variable values baked in — available on Base, Pro, or Team.
Upgrade to customize
Comments
Sign in to join the conversation.
Sign inNo comments yet. Be the first to share your experience with this flow.