Best Practices & Standards
The standards that separate a demo from a production automation — error handling, reliability, performance, testing, monitoring, security, governance, and the standards that keep an estate maintainable — distilled into rules you can apply on every flow.
Error handling
Assume every external call can fail. Group the happy path into a Try scope, then add a Catch scope set to run only when the Try fails — Power Automate’s equivalent of try/catch.
- Put main logic in a Scope named
Try. - Add a
Catchscope; open its Configure run after and tick *has failed*, *has timed out*, and *is skipped*. - Inside Catch, use the
result('Try')function (often via a Filter array) to pull the failed action and its error message. - Log the failure (Dataverse / SharePoint / Teams) and Terminate with status
Failedso the run history reflects reality.
Filter array
From: result('Try')
Query: item()?['status'] is equal to Failed
Then read: first(body('Filter_array'))?['error']?['message']Mind the 30-day limit
A flow run can last at most 30 days. For long approvals or waits, set explicit timeouts and design a relay pattern (end the run, start a fresh one with state in Dataverse) rather than an open-ended wait. See Reliability below.
Reliability
Reliability is the flow doing the right thing exactly once, surviving transient failures, and recovering cleanly. Connectors and HTTP endpoints give *at-least-once* delivery, so a retried action can run twice — design for it.
- Idempotency / exactly-once. Key writes on a business identifier so a replay can’t create duplicates — use Dataverse alternate keys with *Upsert*, or a check-before-create lookup. Never assume an action runs only once.
- Transient-fault handling. Configure a per-action Retry policy (Settings → exponential or fixed, up to 90 attempts) for flaky endpoints instead of failing the whole run on the first blip.
- Dead-letter pattern. When an item still fails after retries, write the payload plus error to a *poison* / dead-letter store (Dataverse table, Service Bus queue, SharePoint) and continue — one bad record shouldn’t halt the batch.
- Relay for long waits. A run can’t exceed 30 days; pending approvals time out at that point. For longer processes, persist state to Dataverse, terminate, and have a scheduled or callback-triggered flow resume — a relay, not an open-ended wait.
- Cap concurrency where order matters. Parallelism speeds throughput but can reorder or collide writes; throttle Apply to each concurrency for ordered or rate-limited targets.
Type: Exponential
Count: 4
Interval: PT10S
Minimum: PT5S
Maximum: PT1HMake the Catch reusable
Pair every reliability pattern with the Try/Catch scope above and a dead-letter write, so a failed item is logged and recoverable instead of silently lost.
Performance
- Filter at the source. Pass
$filter/ OData / SQL WHERE to the *get* action instead of pulling everything and filtering in the flow. - Avoid Apply to each where you can.
Select,Filter array, andjoin()run far faster than a loop for shaping data. - Enable concurrency on Apply to each (Settings → Concurrency control) when iterations are independent — but cap it for rate-limited APIs.
- Use `$select` on Graph/Dataverse to fetch only the columns you need.
- Compose over Set variable for read-only intermediate values — Compose has no concurrency penalty inside parallel branches.
- Paginate large reads rather than raising the threshold blindly.
Testing
You can’t unit-test a cloud flow the way you test code, but you can build deliberate test harnesses and gate every release on them.
- Test flows. Build a companion flow (or a manual-trigger variant) that calls your child flow with known inputs and asserts the outputs, so you can re-run it after every change.
- Mock the inputs. Parameterise the trigger and drive branches with sample payloads (a manual trigger or a
Composeof test JSON) instead of waiting for the real event to fire. - Contract tests for APIs. Validate request/response shape with a Postman / REST Client collection before wiring the endpoint into a flow, then pin the schema with Parse JSON from a captured sample.
- Flow Checker. Run the maker-portal Flow Checker before you save or publish — it flags errors and rule violations early.
- Negative tests. Feed malformed, empty, and oversized inputs and confirm the Catch path and dead-letter behave as designed.
- Regression strategy. Keep a seeded set of test records with expected results and re-run the harness on every import to TEST before promoting to PROD.
Make checks a pipeline gate
Run Solution Checker (and Flow Checker) as an automated gate in your ALM pipeline so a flow can’t reach PROD with known rule violations. See Naming & ALM.
Monitoring & observability
You can’t operate what you can’t see. Per-flow run history is the floor, not the ceiling — instrument for estate-wide visibility.
- Application Insights. Route Power Platform telemetry to Application Insights so traces, failures, and performance are queryable (Kusto) across every flow and app — not trapped in one run history.
- Failure alerts. Add a monitoring flow over the Power Automate Management connector (or the built-in failure notification) that posts failures to Teams, email, or your ITSM tool the moment they happen.
- Run-history retention. Run history defaults to ~28 days in Dataverse (
FlowRunTimeToLiveInSeconds), configurable to 7 / 14 / 28 days or a custom TTL. For audit or longer retention, log key events to your own Dataverse table. - Analytics. Use the in-product Analytics tab (runs, actions, errors) and the CoE Starter Kit dashboards for tenant-wide trends.
- Custom telemetry. Emit checkpoints with
Trace()in Power Fx (canvas apps / Monitor) and a structured logging child flow in Power Automate, so you capture business context the platform doesn’t. - Correlation id. Stamp every run with a
guid()and carry it across child flows and API calls so you can stitch a multi-flow transaction back together.
guid() -> Set variable varCorrelationId
// pass into every child flow input and outbound call header
x-correlation-id: @{variables('varCorrelationId')}Reuse
- Child flows. Extract shared logic into solution-aware child flows (called via *Run a Child Flow*) — authored and versioned once, reused everywhere.
- Components. Use canvas/component libraries for app UI and flow templates for makers, so patterns are copied with intent instead of by hand.
- Shared connection references. One connection reference per connector and purpose, referenced by many flows, so credentials rotate in a single place.
- Custom connectors. Wrap a recurring API (auth + OpenAPI definition) into a custom connector so every flow consumes a typed, governed interface instead of raw HTTP.
- Environment variables. Centralise shared config (URLs, list names, thresholds) so reused logic binds per environment at import.
Name reusable parts predictably
Child flows, connection references, and env vars are reused most when they’re named consistently. See the conventions in Naming & ALM.
Security
- Secrets never inline. Store keys and connection strings in environment variables backed by Azure Key Vault; reference them at runtime.
- Service accounts, not people. Own production connections with a dedicated service identity so a leaver doesn’t break the flow.
- Least privilege. Grant connections and app registrations only the scopes they need (e.g.
Sites.SelectedoverSites.FullControl.All). - Secure inputs/outputs. Toggle *Secure inputs* / *Secure outputs* on actions handling tokens or PII so values are masked in run history.
- Respect DLP. Confirm the environment’s data-loss-prevention policy before adding premium/HTTP connectors.
Security deep-dive
Beyond the basics, a hardened solution controls identity, rotation, and auditability — the things an auditor asks about after go-live.
- RBAC, granularly. Assign Dataverse security roles at table and column level; use column-level security for sensitive fields. Never hand a service account System Administrator.
- Service principals. Own production connections and unattended automation with an app registration / service principal (or managed identity), not a named user — it survives leavers and supports certificate auth.
- Secrets rotation. Keep secrets in Key Vault, reference them via environment variables, and set a rotation cadence; prefer certificates over client secrets where the connector allows.
- Audit logging. Enable Dataverse auditing and Microsoft Purview activity logging, and append security-relevant business events to an immutable log table.
- Protect run history. Secure inputs/outputs mask values, but also restrict *who* can open run history — error details and Compose outputs can leak data.
A logged secret is a leaked secret
A token written to a Compose or visible in run history is effectively public to anyone with access. Enable Secure outputs, never log secrets, and rotate immediately if one is exposed.
Governance
Governance is what keeps an estate of makers safe and supportable as it scales past one team.
- DLP policies. Classify connectors as *Business*, *Non-Business*, or *Blocked* per environment, and isolate HTTP and risky connectors so data can’t cross trust boundaries.
- Managed Environments. Turn on Managed Environments for premium governance — sharing limits, weekly digests, solution-checker enforcement, maker onboarding, and IP firewall.
- CoE Starter Kit. Deploy the Center of Excellence Starter Kit to inventory apps, flows, and makers, drive compliance, and run admin automations.
- Environment strategy. Dedicated, governed DEV / TEST / PROD per workload with the Default environment locked down — cross-reference Naming & ALM.
- Tenant policies. Restrict environment creation, app/flow sharing, and trial environments to admins so the estate stays inventoried.
Govern early, not after sprawl
DLP policies and Managed Environments are far cheaper to apply before hundreds of flows exist. Stand them up alongside your first production solution.
Licensing & limits
A flow that’s functionally correct can still be throttled or turned off if it outruns its license. Design within the limits, and pick the license that matches the workload.
- Premium connectors. HTTP, custom connectors, the on-premises data gateway, and premium Dataverse use require a premium license (Power Automate Premium per-user, or Process per-flow).
- Power Platform request limits. Daily action entitlements are set by the flow owner’s performance profile (their license); exceed them and the flow is throttled, then resumes when the sliding window clears.
- Throttling. Both platform request limits and per-connector limits (e.g. ~500 requests/min for a custom connector) throttle — design batching, concurrency caps, and retries accordingly.
- Per-flow vs per-user. Process licenses dedicate capacity to one high-volume flow (stackable, +250k actions/day each); Premium per-user covers all of a single maker’s flows.
- Capacity. Watch Dataverse database, file, and log capacity — long run-history retention and audit logs consume storage.
| License | Covers | Best for |
|---|---|---|
| Seeded (Microsoft 365 / Dynamics 365) | Standard connectors only | Simple personal or office automation |
| Power Automate Premium (per user) | All of one maker’s flows, premium connectors, attended RPA | Individual makers building many flows |
| Power Automate Process (per flow) | One flow/agent with dedicated, stackable capacity | High-volume, business-critical flows |
| Pay-as-you-go (Azure meter) | Per-run billing via an Azure subscription | Spiky or low-volume workloads without standing licenses |
Flows get turned off automatically
Power Automate suspends a flow after 14 days of continuous errors, 14 days of consistent throttling, or 90 days with no trigger activity (premium/capacity-licensed owners are exempt from the inactivity rule). Monitor failure email and the admin center. See the official limits & configuration reference.
Maintainability
- Rename every action.
Get_open_ordersbeatsGet_items_3. Names become expression references — fix them before you build dependencies. - Group with scopes and add notes to explain non-obvious logic.
- Extract reuse into child flows (or a solution’s shared components) instead of copy-pasting.
- No hardcoding. Site URLs, list names, emails, and thresholds belong in environment variables.
- Build solution-aware. Author inside a solution from day one so the flow is portable across environments.
Microsoft’s Well-Architected guidance
For deeper standards, the Power Platform Well-Architected framework covers reliability, security, performance, and operational excellence with concrete checklists. The Standards mapping below ties each area back to its pillar.
Standards
Documentation & versioning
- Document the flow. Capture purpose, trigger, owner, dependencies, environment variables, and connection references in the flow description and a solution README.
- Version through solutions. Bump the solution version on every release and keep a changelog; source-control unpacked solutions so diffs are reviewable (see Naming & ALM).
- Use a version scheme. Solutions carry a
major.minor.build.revisionnumber — increment it deliberately, don’t let it drift. - Notes in-flow. Annotate non-obvious branches and expressions so the next maintainer doesn’t reverse-engineer them.
Code-review checklist
| Review check | Why it matters |
|---|---|
| Actions renamed; scopes named | Readable expressions and clean diffs |
| Try/Catch around external calls; Terminate on failure | Failures surface honestly in run history |
| No hardcoded URLs, keys, or emails | Portable across environments |
| Secrets in Key Vault; secure inputs/outputs set | No leaked credentials or PII |
Source filtering, $select, pagination, concurrency considered | Stays within performance and request limits |
| Idempotency, retry, and dead-letter in place | Safe to replay; no duplicates or lost items |
| Telemetry: failure logging + correlation id | Operable and traceable in production |
| Solution-aware; version bumped; DLP-compliant connectors | Clean ALM and governance |
| Tested: happy path + at least one negative test | Behaviour verified before promotion |
Well-Architected mapping
Every area above maps to one of the five Power Platform Well-Architected pillars. Use the pillar checklists to pressure-test a design before you build.
| Pillar | Workload concern | Sections above |
|---|---|---|
| Reliability | Resiliency, availability, recovery | Error handling, Reliability, Testing |
| Security | Confidentiality, integrity, availability | Security, Security deep-dive, Governance |
| Operational Excellence | Observability, DevOps, safe deployment | Monitoring, Standards, Governance |
| Performance Efficiency | Scale to meet demand | Performance, Reuse, Licensing & limits |
| Experience Optimization | Usable, intuitive experiences | Maintainability, Reuse |
Run the assessment
Microsoft’s Power Platform Well-Architected framework includes a free, pillar-by-pillar review you can use to score and iteratively improve a workload.