I have been using AI agents to build Power Automate flows. OpenClaw, specifically. Flows are JSON. Agents are good at JSON. So I expected it to just work — build and debug, end to end.
Building worked. Debugging was where it all fell apart.
And honestly? I think most people who try this will hit the exact same wall.
I started with the standard approach:
Azure app registration
Grant application permissions for Power Platform APIs
Grant a Power Platform system user role
Let the agent call the management APIs
That part worked fine. For basic tasks, it felt like vibe coding actually works.
The first flow the agent built for me was simple: *"When an item is created in a SharePoint list, send a Teams message to Catherine."*
It did it in one attempt. That gave me confidence.
Next I asked the agent to build a custom connector. It generated a Swagger definition and saved me time. But the connector still had errors — and each fix-deploy-test cycle cost tokens.
That was my first signal: agents can generate quickly, but debugging is where things get expensive.
The real test was using the custom connector to grab real HR data from external system. We needed both the Employee endpoint and the Employee History endpoint. There are thousands of employees, so the flow has to paginate — loop until there is no next page.
This is where the agent started burning through tokens. Not because it couldn't write the logic, but because it couldn't see what was actually failing.
Three things, specifically:
1. The agent couldn't catch the true error. It kept saying the loop failed due to "connection" issues. But the real error was inside a nested loop — entities were not referenced correctly. A scoping problem, not a connection problem. The agent couldn't see deep enough to know.
2. It mixed in Logic Apps concepts that don't exist in Power Automate. Agents borrow patterns from whatever training data they have. Mine kept trying `map()`, `filter()`, `select()`, or building "compose" style shapes that look right in JSON but fail at runtime. Power Automate is not Logic Apps. Close, but not the same.
3. Power Automate keeps the useful debug detail behind the UI. In the portal, you expand the failed action and see the real payload, the real error, the actual values. Through the API? The agent often couldn't get that detail. So it guessed, patched, redeployed, failed, and repeated.
That's where the token burn comes from. We went through 10–15 cycles on one flow. Easily $15–20 in LLM costs just to debug a single moderately complex workflow.
We tried the try-catch pattern my husband John wrote about back in 2018. Wrap actions in Scopes. On failure, run an error Scope. Use `result()` to capture what happened, then write it somewhere — we used a SharePoint list.
It helped. At least the agent had something to read.
But you need to implement it everywhere. Inside loops. Inside nested loops. Inside the parts that actually fail. Every debugging cycle meant modifying the flow itself just to capture error info, then modifying it again to remove the scaffolding. A lot of back and forth. And every round-trip burns tokens.
Since John is running this SaaS FlowStudio. He already had the actions I needed — the ones that expose real run details, per-action error info, action inputs and outputs, loop iterations. The stuff you see in the Power Automate portal UI, but available as APIs.
After watching me lose yet another round against `InvalidTemplate`, he said: *"I already have all those APIs. I'll just wrap them in an MCP server."*
And that was the game changer.
With the MCP server, the agent can finally see what actually broke. Here's what debugging looks like now:
1. Agent calls get_live_flow_runs - finds the failed run
2. Agent calls get_live_flow_run_error - gets structured, per-action error details. Not just "Failed" — the actual error message, the failing expression, the HTTP response body.
3. Agent calls get_live_flow_run_action_outputs - reads action inputs/outputs
4. Agent calls update_live_flow - deploys the fix
Four API calls. One round-trip. No try-catch scaffolding. No SharePoint. No guessing.
When the nested loop has a scoping problem? It's obvious now — the agent can see the actual action outputs and references.
That HR data flow that took $15–20 and 45 minutes to debug? Fixed in under 2 minutes. Pennies in token costs.
After it worked for me, I packaged everything into three GitHub Copilot agent skills:
- power-automate-mcp - Connect to and operate flows (list, read, trigger, resubmit, cancel)
- power-automate-debug - Step-by-step diagnostic workflow for failing flows
- power-automate-build - Build and deploy flow definitions from scratch
They work with any MCP-capable agent — GitHub Copilot, OpenClaw, Claude, or anything that speaks the Model Context Protocol.
The skills are free and open source. They need a FlowStudio MCP server to talk to. We're offering a free Starter plan - 100 MCP calls, no credit card required so you can try the full experience without committing to anything.
Get started at mcp.flowstudio.app