Advanced

Adobe PDF Form Data Extraction to Dataverse

When a filled PDF form arrives (email or SharePoint), the flow extracts the form-field values with Adobe PDF Services, writes a structured record to Dataverse, and posts low-confidence or unmapped fields to Teams for review. Turns submitted PDF forms into clean structured data without re-keying.

HTTPSharePointMicrosoft DataverseMicrosoft Teams

Unique name

FlowLibsAdobePdfFormDataExtraction

Publisher

FlowLibs (flowlibs)

Version

1.0.0.0

Components

12 env vars + 1 cloud flow

Request access

Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.

Report an issue with this flow

What it does

Overview

Status: BUILT (Flow Checker 0 errors / 0 warnings, ships Off). This flow turns filled PDF forms into clean, structured Dataverse records. When a completed PDF form lands in a SharePoint library, the flow extracts its text with Adobe PDF Services (Extract REST API), parses the configured fields, writes a structured submission record to Dataverse, and posts any unmapped / low-confidence fields to a Microsoft Teams review channel.

Why it matters: Re-keying submitted PDF forms (applications, intake, claims) is slow and error-prone. Automated extraction creates clean records and a built-in exception lane for anything that doesn't map cleanly — no manual transcription.

Why you'd use it

Use Case

Operations and Finance teams that receive completed PDF forms want the data captured automatically into Dataverse, with a human-review path for ambiguous values. A filled PDF arriving in the inbox library is detected, its fields are read by Adobe, and a governed record is created. Fields that can't be located are flagged to a Teams channel so a person can complete them — nothing is silently dropped.

Step-by-step

Flow Architecture

When a PDF Form Arrives

SharePoint — GetOnNewFileItems (poll + splitOn)

Fires per new PDF in the configured inbox library, polling at 5-minute intervals.

Initialize variables

Initialize Variable

Correlation id (@guid()), site URL, Adobe base, work folder path, status/uris, struct-json, missing-fields and mapped-pairs arrays.

Get File Content

SharePoint — GetFileContent

Read the new PDF bytes.

Adobe token + asset + upload

HTTP — POST /token, POST /assets, PUT (upload)

Get an OAuth token, create an upload asset, and PUT the PDF bytes to Adobe.

Start Extract + Poll Until Complete

HTTP — POST /operation/extractpdf + Until loop

Start the Extract job ({elementsToExtract:["text"]}) and poll the Location URL until done/failed; capture the result ZIP downloadUri.

Download + Save + Unzip Result

HTTP GET + SharePoint CreateFile + ExtractFolderV2

Download the presigned result ZIP, save it to the work library, and unzip it into a per-submission folder.

Read + Flatten Structured Data

SharePoint GetFileContentByPath + Compose/Select

Read structuredData.json, isolate the elements[] text array, and flatten it into a readable string.

Apply to each Field

Foreach (sequential) + Filter

For each {label,column} in the field-map env var, match the elements; append {label,column,value} to mapped pairs or the label to missing fields.

Create Submission Record

Solution config

Environment Variables

Schema name	Type	Default	Description
flowlibs_SharePointSiteURL	String	https://your-tenant.sharepoint.com	Site hosting the libraries (reused).
flowlibs_FormInboxLibraryId	String	<configure>	Library the trigger watches for new PDFs.
flowlibs_FormWorkFolderPath	String	/FlowLibs PDF Form Work	Work library path for the ZIP + extracted files.
flowlibs_FormTableName	String	flowlibs_pdfformsubmissions	Target Dataverse entity set.
flowlibs_FormFieldMap	String	[{"label":"Full Name","column":"flowlibs_fullname"},{"label":"Email","column":"flowlibs_email"},{"label":"Date","column":"flowlibs_submitteddate"},{"label":"Amount","column":"flowlibs_amount"},{"label":"Signature","column":"flowlibs_signature"}]	PDF field label -> Dataverse column map.
flowlibs_AdobePdfServicesBase	String	https://pdf-services.adobe.io	Adobe REST base (reused).
flowlibs_AdobeClientId	String	<configure>	Adobe client id / X-API-Key (reused).
flowlibs_AdobeClientSecret	String	<configure>	Adobe client secret (reused).
flowlibs_AdobePollIntervalSeconds

Auth dependencies

Connectors & Connections

Connector	API name	Actions used
HTTP	http	POST /token POST /assets PUT (upload) POST /operation/extractpdf GET (status/download)
SharePoint	shared_sharepointonline	GetOnNewFileItems GetFileContent CreateFile ExtractFolderV2 GetFileContentByPath
Microsoft Dataverse	shared_commondataserviceforapps	CreateRecord
Microsoft Teams

Tweaks & variations

Customization Guide

Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.

Validation: Require key fields (e.g. Email, Amount) before creating the record; route incomplete forms entirely to review.
Approval: Branch high-value submissions through an Approvals step before committing.
Confidence routing: Extend the missing-fields logic to also flag values that match a label but fail a format check (regex on email/amount/date).
Tables/structure: Add elementsToExtract:["text","tables"] to capture tabular form data.
Per-field columns: Extend the table with real typed columns and write mapped values directly instead of the JSON blob.

Helpers & literals

Key Expressions

The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.

EXPR.01Result ZIP URL (Extract)

Resolve the presigned result ZIP URL from the Extract job.

workflow definition language

@coalesce(body('Adobe_Get_Job_Status')?['content']?['downloadUri'], body('Adobe_Get_Job_Status')?['asset']?['downloadUri'], variables('varResultZipUri'))

EXPR.02Decode structuredData.json

Decode the extracted structured JSON.

workflow definition language

@base64ToString(body('Get_Structured_Data_Json')?['$content'])

EXPR.03Isolate elements

Pull the elements[] array from the structured data.

workflow definition language

@coalesce(json(variables('varStructJson'))?['elements'], createArray())

EXPR.04Field match (Filter)

Case-insensitive match of a field label against extracted text.

workflow definition language

@contains(toLower(coalesce(item()?['Text'],'')), toLower(items('Apply_to_each_Field')?['label']))

EXPR.05Status

Set the record status based on missing fields.

workflow definition language

@if(greater(length(variables('varMissingFields')),0),'Needs Review','Extracted')

Make it yours

Customize & download

Generate a ready-to-import copy of this solution with your environment-variable values baked in — available on Base, Pro, or Team.

Upgrade to customize

Adobe PDF Form Data Extraction to Dataverse

Overview

Use Case

Flow Architecture

When a PDF Form Arrives

Initialize variables

Get File Content

Adobe token + asset + upload

Start Extract + Poll Until Complete

Download + Save + Unzip Result

Read + Flatten Structured Data

Apply to each Field

Create Submission Record

Environment Variables

Connectors & Connections

Customization Guide

Key Expressions

EXPR.01Result ZIP URL (Extract)

EXPR.02Decode structuredData.json

EXPR.03Isolate elements

EXPR.04Field match (Filter)

EXPR.05Status

Customize & download

Check Needs Review

Comments