Data Factory SLA and Run Duration Monitor
On a schedule, the flow reviews recent Azure Data Factory pipeline runs, flags those exceeding their expected duration or missing their SLA window, trends run times, and alerts the data team plus updates a Power BI ops dashboard. Keeps data pipelines on schedule and within SLA.
Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.
Overview
This flow keeps Azure Data Factory pipelines on schedule and within SLA. On an hourly recurrence it pulls recent pipeline runs, re-fetches each run's authoritative metadata, classifies it (failed, over-SLA, on-track, in-progress), writes one snapshot row per run to a governed Dataverse fact table that feeds a Power BI ops dashboard, and — only when something breached — alerts the data team in Microsoft Teams with a per-run summary. Slow or failed pipeline runs delay data without anyone noticing; continuous SLA monitoring surfaces breaches and degradation trends early, and the Dataverse fact table gives Power BI a clean, historical basis for trend reporting. Ships Off; going live requires only authorizing the three connections, supplying the Azure service-principal + Data Factory env vars, and setting the Teams group/channel ids.
Use Case
A data team running scheduled Azure Data Factory pipelines wants assurance that pipelines run on time and within expected duration, an auditable history of every run for trend analysis, and a proactive alert the moment a run fails or blows past its SLA window — without manually watching the ADF monitoring blade.
The flow is ideal for teams that:
- Hourly SLA sweep over recent ADF pipeline runs
- Classifies each run: Run Failed / SLA Exceeded / On Track / In Progress
- Writes one snapshot row per run to a governed Dataverse fact table for Power BI
- Posts a consolidated Teams alert only when a run breaches
Flow Architecture
Recurrence SLA Check
Recurrence (every 1 hour)Kicks off the SLA review on a schedule.
Initialize Correlation Id
InitializeVariableguid() stamped on every snapshot row and the alert so one check run is traceable end to end.
Initialize config variables
InitializeVariable (x9)Subscription id, resource group, Data Factory name, SLA minutes, lookback hours, Teams group/channel, plus integer breach counter and string breach summary.
Query Recent Pipeline Runs
HTTP (Azure AD OAuth) - queryPipelineRunsLists recent runs over the lookback window. Built-in HTTP with Azure AD OAuth is used only because the Data Factory connector has no list/query-runs operation.
Parse Pipeline Runs
Parse JSONExposes each run's runId, pipelineName, status, timing, durationInMs.
Apply to each Run - Check If Run Breached
Foreach (sequential) + If conditionEvaluates each run, records a snapshot, and counts breaches. Sequential so the breach counter stays accurate.
- Get Pipeline Run Details — ADF GetPipelineRun: authoritative per-run record (status, runStart, durationInMs).
- Compose Duration Minutes — Converts durationInMs to whole minutes.
- Compose Breach Type — Classifies: Run Failed / SLA Exceeded / On Track / In Progress.
- Record SLA Snapshot — Dataverse CreateRecord: one snapshot row to the fact table (the Power BI feed).
Environment Variables
| Schema name | Type | Default | Description |
|---|---|---|---|
| flowlibs_AzureSubscriptionId | String | <configure> | Subscription containing the Data Factory. |
| flowlibs_ResourceGroupName | String | <configure> | Resource group containing the Data Factory. |
| flowlibs_DataFactoryName | String | adf-yourcompany-analytics | Name of the ADF to monitor. |
| flowlibs_AzureTenantId | String | <configure> | Tenant for the service-principal OAuth call. |
| flowlibs_AzureClientId | String | <configure> | Service-principal app id (Data Factory Reader is enough). |
| flowlibs_AzureClientSecret | String | <configure> | Service-principal secret — store via a Key Vault env var in production. |
| flowlibs_LookbackHours | String | 1 | Hours of pipeline runs to review each check. |
| flowlibs_DefaultSLAMinutes | String | 60 | SLA threshold in minutes; a successful run over this is flagged SLA Exceeded. |
| flowlibs_SLARunFactTable | String | flowlibs_adfslarunsnapshots | Logical entity-set name of the Dataverse fact table. |
Connectors & Connections
| Connector | API name | Actions used |
|---|---|---|
| Azure Data Factory | shared_azuredatafactory | GetPipelineRun |
| Microsoft Dataverse | shared_commondataserviceforapps | CreateRecord |
| Microsoft Teams | shared_teams | PostMessageToConversation |
| HTTP | queryPipelineRuns (ADF REST via Azure AD OAuth) |
Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.
Customization Guide
Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.
- Per-pipeline SLAs
- Replace the single flowlibs_DefaultSLAMinutes with a JSON map (pipeline -> SLA minutes) and look up each run's threshold in Compose Breach Type.
- Missed-run detection
- Compare expected scheduled runs against returned runs and alert when a scheduled run never started.
- Live Power BI refresh
- Once a Power BI connection is authorized in the tenant, add a RefreshDataset action after the loop (bind the dataset id to a new flowlibs_PowerBiDatasetId env var). Today the Dataverse fact table itself is the governed dashboard feed.
- Rolling-average anomaly
- Trend flowlibs_durationminutes per pipeline and flag runs slower than the rolling average, not just a fixed SLA.
- Auto-retry transient failures
- Branch on status = Failed with a transient error and call the ADF connector to re-run the pipeline.
Key Expressions
The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.
EXPR.01Lookback window start
Computes the start of the lookback window for the queryPipelineRuns call.
EXPR.02Duration in minutes
Converts the run duration from milliseconds to whole minutes.
EXPR.03Breach classification
Run Failed for failed/cancelled; SLA Exceeded for a successful run over the SLA; otherwise On Track or In Progress.
EXPR.04Alert gate
Only post to Teams when at least one run breached.
Comments
Sign in to join the conversation.
Sign inNo comments yet. Be the first to share your experience with this flow.