Data Factory Lineage and Run Audit Log
After each Azure Data Factory pipeline run, the flow records a lineage and audit entry in Dataverse — source, target, row counts, parameters, duration, outcome, and triggering user/event — building a searchable history for compliance and impact analysis. Gives ADF runs a governed audit trail.
Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.
Overview
This flow gives Azure Data Factory pipeline runs a governed audit trail. At the end of each run, Data Factory calls the flow (an HTTP request trigger) with the run id. The flow reads the run's metadata from ADF, reads the activity-level lineage (source, target, rows read/copied) from the ADF REST API, records a complete lineage and audit entry in a Dataverse table, classifies the outcome, and emails the data team when a run fails or shows an abnormal row-count deviation. A correlation id is minted per run for cross-system tracing. When data is wrong, you need to know which run produced it, from what source, to what target, and how many rows moved — a governed run/lineage log makes audits and impact analysis fast and surfaces silent data loss as it happens. Ships Off (a FlowLibs reference flow); going live requires only connection authorization and environment-variable configuration.
Use Case
A data team needs an auditable, searchable record of every Data Factory pipeline run and its lineage — source, target, row counts, parameters, duration, outcome, and the identity/event that triggered it — for compliance and impact analysis.
The flow is ideal for teams that:
- Event-driven: logs each pipeline run the moment it finishes
- Captures full lineage — source, target, rows read/copied, duration, invoked-by
- Governed Dataverse audit table for compliance and impact analysis
- Classifies and emails on failures and abnormal row-count deviations
Flow Architecture
When an HTTP request is received
Request (manual)Receives { runId, pipelineName } POSTed by a Web activity at the tail of each ADF pipeline.
Init varRunId, varCorrelationId
InitializeVariableCapture the run id from the trigger; mint a correlation id for cross-system tracing.
Init ADF location variables
InitializeVariable (x3)Load subscription id, resource group, and Data Factory name from environment variables.
Init anomaly + notify variables
InitializeVariable (x2)Load the row-anomaly threshold and the notification recipient.
Init lineage holders
InitializeVariable (x4)varRowsRead, varRowsCopied, varSource, varTarget (defaults 0 / Unknown).
Get Pipeline Run Details
Azure Data Factory - GetPipelineRunRead status, parameters, runStart/runEnd, durationInMs, invokedBy.
Query Activity Runs
HTTP (ARM OAuth) - queryActivityrunsADF REST call for activity-level input/output: source, target, row counts. The ADF connector has no activity-run operation, so this one call is HTTP with service-principal OAuth.
Parse lineage variables
SetVariable (x4)Parse the first activity's rowsRead/rowsCopied/source/target from the REST response.
Compose Outcome
ComposeClassify the run: Failed / Anomaly / Succeeded.
Environment Variables
| Schema name | Type | Default | Description |
|---|---|---|---|
| flowlibs_LineageTable | String | flowlibs_adflineages | Audit table entity-set name. |
| flowlibs_DataFactoryName | String | <configure> | Data Factory name. |
| flowlibs_AzureSubscriptionId | String | <configure> | Subscription holding the Data Factory. |
| flowlibs_ResourceGroupName | String | <configure> | Resource group of the Data Factory. |
| flowlibs_AzureTenantId | String | <configure> | Tenant for the ARM OAuth call. |
| flowlibs_AzureClientId | String | <configure> | App-registration client id (ARM OAuth). |
| flowlibs_AzureClientSecret | String | <configure> | Client secret — replace at deploy; Key Vault recommended. |
| flowlibs_RowAnomalyPct | String | 50 | Allowed % deviation between rows read and copied before a run is flagged as an anomaly. |
| flowlibs_AnomalyNotifyEmail | String | alerts@yourcompany.com | Recipient for failure/anomaly emails. |
Connectors & Connections
| Connector | API name | Actions used |
|---|---|---|
| Azure Data Factory | shared_azuredatafactory | GetPipelineRun |
| Microsoft Dataverse | shared_commondataserviceforapps | CreateRecord |
| Office 365 Outlook | shared_office365 | SendEmailV2 |
| HTTP | queryActivityruns (ADF REST via ARM OAuth) |
Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.
Customization Guide
Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.
- Wire the trigger
- Add a Web activity at the end of each ADF pipeline that POSTs { runId, pipelineName } to the flow URL. Turn the flow On to get the URL.
- Multi-activity lineage
- The flow parses the first activity's row counts. For pipelines with several copy activities, swap the Set-Variable expressions for an Apply-to-each over body('Query_Activity_Runs')?['value'] and aggregate.
- Impact graph
- Add columns linking source -> target chains, or push lineage to Microsoft Purview.
- Anomaly logic
- flowlibs_RowAnomalyPct controls the read-vs-copied tolerance; extend Compose Outcome to compare against a historical average from prior rows in the table.
- Retention
- Add a scheduled companion flow to prune flowlibs_adflineage rows per your audit-retention policy.
Key Expressions
The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.
EXPR.01Outcome classification
Failed if status is not Succeeded; Anomaly if rows read vs copied deviate beyond the configured percentage; otherwise Succeeded.
EXPR.02Row parse (safe)
Safely reads rowsRead from the first activity, defaulting to 0.
EXPR.03Source / target type
Reads the source store type from the first activity input, defaulting to Unknown.
EXPR.04ADF REST URI
queryActivityruns endpoint for the run (ARM, api-version 2018-06-01).
Comments
Sign in to join the conversation.
Sign inNo comments yet. Be the first to share your experience with this flow.