Flow Error Rate Spike Detector
Scheduled flow calculates rolling 7-day failure rates per flow from Dataverse run logs. When any flow's failure rate exceeds 2x its historical baseline, posts a Teams adaptive card to the admin channel with trend data, flow owner, and a direct link to the run history.
Provided as-is, without warranty of any kind. Review and test each pattern in a non-production environment before deploying it to live automations. See our Terms.
Overview
This flow monitors the health of all cloud flows in a Power Platform environment by calculating rolling failure rates and comparing them against a historical baseline. When any flow's error rate spikes beyond a configurable threshold (default: 2x the baseline), it posts a detailed HTML alert to a Teams channel so admins can investigate before failures cascade.
Use Case
Power Platform environments with dozens or hundreds of cloud flows need proactive monitoring. Individual flow failures may go unnoticed, but a sudden spike in failure rate often signals a systemic issue — expired credentials, a connector outage, a schema change in a data source, or a recent deployment that broke dependencies. This flow catches those spikes early and routes them to the admin team with actionable context.
Flow Architecture
Recurrence
RecurrenceRuns daily at 8:00 AM UTC.
Initialize Variables
Initialize variable (×9, parallel)Binds 7 environment variables to runtime variables plus 2 working variables (varAlertHtml for accumulating HTML rows, varSpikeCount for tracking flagged flows).
Compute Current Period Start
ComposeCalculates the start of the current monitoring window using addDays(utcNow(), mul(-1, varLookbackDays)).
Compute Baseline Period Start
ComposeCalculates the start of the baseline window using addDays(utcNow(), mul(-2, varLookbackDays)).
List Current Period Flow Runs
Dataverse — List rowsQueries the flow session table, filtered to createdon >= currentPeriodStart. Runs in parallel with the other two data queries.
List Baseline Period Flow Runs
Dataverse — List rowsQueries the same flow session table, filtered to the preceding window (baselinePeriodStart to currentPeriodStart). Runs in parallel.
List All Flows
Power Automate Management — ListFlowsInEnvironment_V2Lists every cloud flow in the target environment to get flow names, owners, and metadata. Runs in parallel.
For Each Flow
Apply to eachIterates over every flow returned by List All Flows and runs the per-flow analysis: filter current runs, filter baseline runs, filter current failed runs, filter baseline failed runs, compute current failure rate (with zero-division guard), compute baseline failure rate, then evaluate the spike condition.
Environment Variables
| Schema name | Type | Default | Description |
|---|---|---|---|
| flowlibs_TeamsGroupId | String | <configure> | Microsoft 365 Group ID for the Teams team that should receive spike alerts. |
| flowlibs_TeamsChannelId | String | <configure> | Channel ID within the Teams group where alert messages are posted. |
| flowlibs_MakerActivityFlowRunTable | String | admin_flowsession | Logical name of the Dataverse table containing flow run records. Default targets the admin_flowsession table used by Power Automate process advisor / maker activity logging. |
| flowlibs_AdminNotificationEmail | String | <configure> | Admin email address — reserved for future email integration. Set to the mailbox that should receive copies of the alert (for example alerts@yourcompany.com). |
| flowlibs_ErrorRateThresholdMultiplier | String | 2 | Factor by which the current period's failure rate must exceed the baseline rate before a spike is flagged (e.g. 2 = 2× baseline). |
| flowlibs_ErrorRateLookbackDays | String | 7 | Number of days per monitoring window. The current window covers the last N days; the baseline window covers the N days before that. |
| flowlibs_TargetEnvironmentName | String | <configure> | Environment name (GUID or display name accepted by the Power Automate Management connector) for the environment whose flows should be monitored. |
Connectors & Connections
| Connector | API name | Actions used |
|---|---|---|
| Microsoft Dataverse | shared_commondataserviceforapps | ListRecords (Query current-period flow run session records) ListRecords (Query baseline-period flow run session records) |
| Power Automate Management | shared_flowmanagement | ListFlowsInEnvironment_V2 (List all flows in the target environment with metadata) |
| Microsoft Teams | shared_teams | PostMessageToChannel (Posts the spike alert HTML to the admin channel) |
Note — All connections are referenced as solution connection references; the flow is portable between environments as long as a connection is mapped at import time.
Customization Guide
Almost every realistic variant of this flow can be implemented by changing environment variable values. A few cases require small edits inside the flow definition — those are called out explicitly below.
- Adjust sensitivity
- Change flowlibs_ErrorRateThresholdMultiplier to a higher value (e.g. 3) to reduce noise, or lower (e.g. 1.5) for tighter monitoring.
- Change monitoring window
- Set flowlibs_ErrorRateLookbackDays to 14 for a two-week rolling window, or to 30 for a monthly trend.
- Minimum sample filter
- The spike check requires > 5 runs in the current period to avoid false positives on low-volume flows. Edit the Check_If_Spike_Detected condition to adjust this threshold.
- Add email notification
- Add an Outlook SendEmailV2 action after the Teams post to also email the admin using varAdminEmail.
- Target different environments
- Update flowlibs_TargetEnvironmentName to monitor a specific non-default environment.
- Customize alert format
- Edit the Compose_Alert_Email_Body action to modify the HTML template, add columns, or change the color scheme.
Key Expressions
The flow is intentionally light on Power Fx / WDL gymnastics — the heaviest expressions are the branch-name concatenation and the approval outcome check. They are listed below in the order they appear in the flow.
EXPR.01Lookback date calculation
Used in Compute Current Period Start; the baseline variant multiplies by -2 instead of -1.
EXPR.02Failure rate with zero-division guard
Per-flow failure rate: returns 0 when there were no runs, otherwise failed/total as a decimal. Same shape is reused for the baseline rate.
EXPR.03Spike detection (3-condition AND)
Spike fires only when current rate is non-zero, the current window has > 5 runs, and current rate exceeds the multiplier × baseline rate.
EXPR.04Percentage formatting
Renders the failure rate as a one-decimal percentage for the Teams HTML report.
Comments
Sign in to join the conversation.
Sign inNo comments yet. Be the first to share your experience with this flow.