Best way to automate Outlook emails + PDF work orders into Excel? (n8n / Power Automate) A property maintenance company asked on r/MicrosoftFlow this week. The OP listed the failure modes that kept biting them: inconsistent email formats, PDF extraction problems, Excel append failures, table and column mapping issues, sometimes Outlook body content getting truncated. The follow-up question I had was the more interesting one: AI Builder, Copilot Studio with Work IQ Outlook, or build it yourself?
So I screenshotted the post and sent it to my Telegram bot. I was lying on the bed on a Saturday afternoon, no VS Code, no Power Automate designer, no Azure portal. On the other end of the Telegram chat is a Claude PM running TmuxAgentManager, which dispatches the build to a Codex worker in a tmux session. About two and a half hours later I had a two-flow Power Automate solution writing to Excel with a confidence-gated exception path.
The useful pattern is to combine the email body and the PDF text into one context, extract once, and use a confidence score to decide whether the record goes straight into operations or into an exception queue for human review.
What follows is the field note the Codex worker wrote, lightly edited. What I personally validated: the test emails landed in my Outlook with their PDF attachments, both flows ran in the Power Automate run history, and the extracted rows landed in WorkOrder_Operations.xlsx with the fields populated. Demo worked end to end. Great work to the worker, to the PM for keeping it on rails, and to Flow Studio MCP for giving the worker action-level visibility into every Power Automate run, which is what made the silent-failure debugging in sections [05] and [06] tractable.
The next section is written from the worker's point of view. I kept the engineering detail because the failure modes are the useful part.
|
▸ section · author switch
CODEX WORKER FIELD NOTE
author: Codex (tmux worker, dispatched via Claude PM in Telegram) · build duration: ~2h 30m (incl. mid-build SharePoint→Excel pivot) · tools: Flow Studio MCP, Azure CLI, Microsoft Graph, Office 365, Excel Online, Azure OpenAI, Document Intelligence
|
Owner input via PM: a screenshot of an r/MicrosoftFlow Reddit post for a property maintenance company. Reddit OP's stated pain: scattered information across email body and PDF, inconsistent email formats, table/column mapping issues, occasional blank Outlook bodies. Owner emphasised one constraint twice in Telegram: "One thing important. As per Reddit op. The email needs to contain PDF and job info scattered in email body and pdf". That is the load-bearing requirement. Owner explicitly asked the build-vs-buy question on the Microsoft stack.
Two viable paths fell out of the audit. Both use a custom Power Automate flow as the orchestration shell. The difference is which engine does the OCR and the extraction.
| Path | Engine | Trade-off |
|---|---|---|
| A · AI Builder native | Outlook → AI Builder Extract text in images and PDF documents (OCR prebuilt) → AI Builder Run a prompt with JSON output mode → Excel | No Azure resources to provision. No HTTP actions. JSON output mode can avoid a separate Parse JSON step in many flows. Premium license + AI Builder credits. Model availability depends on region and tenant, no param tuning. |
| B · HTTP + Azure OpenAI | Outlook → HTTP → Azure AI Document Intelligence (prebuilt-read) → HTTP → Azure OpenAI (gpt-4o-mini) → Parse JSON → Excel | Requires an Azure subscription, two Azure resources, key management. Full param control (temperature, max_tokens). Cheaper per call at scale. More setup overhead, plus a ~5s OCR polling delay per PDF that stacks at volume. |
What I should reject up front: AI Builder Document Processing (the custom forms model). It is template-based per collection: train on 5+ examples that share a stable layout. The Reddit OP listed "inconsistent email formats" and the work-order PDFs come in a dozen shapes; that is the disqualifier. The custom forms model is the right call when your invoices all come from the same supplier, not when your inputs are arbitrary.
Decision for this build: Path B. Reason: the owner already has an Azure AI resource provisioned and wants me to reuse it; the marginal cost of HTTP-actions over native is small when the Azure footprint exists. Path A is the recommendation if you are starting from a Power Platform license with no Azure deployment.
I never opened the Power Automate designer or the Azure portal during the build. The flow definition was patched as JSON via Flow Studio MCP. Azure resources were provisioned via az. Excel schema was managed via Graph PATCH.
Two production flows plus one helper that emits diverse test emails on demand. The first end-to-end version landed on SharePoint Lists in under an hour; the owner then pivoted the datastore to Excel mid-build (the Reddit OP's pain point list explicitly named Excel). The Excel migration was the larger of the two rebuild beats in this build: connector swap, schema redefinition via Microsoft Graph, every PostItem action replaced with AddRowV2. Trade-offs of the Excel choice are in [07].
| Environment | Flow Studio Demo |
| Intake flow | WorkOrder_01_Intake_Outlook · Outlook trigger |
| Extract flow | WorkOrder_02_AI_Extract_To_WorkOrder · HTTP trigger |
| Workbook | WorkOrder_Operations.xlsx · 4 tables (Intake, WorkOrders, Exceptions, AuditLog) |
| AI | Azure OpenAI gpt-4o-mini · Document Intelligence prebuilt-read |
The single decision that made this work: do not extract from email OR PDF separately. Concatenate first, extract once.
One model call. One response. The model decides where each field came from. Tenant name in the email body? Fine. Access code only in the PDF? Fine. Both? Fine, it merges them. The confidence field is part of the JSON contract. The model self-reports how sure it is, and the flow uses it as a routing key:
Two failure modes covered: confidence missing entirely (model returned malformed JSON) and confidence below threshold (model is unsure). Anything else flows to the WorkOrders table. Below threshold flows to Exceptions with the partial extraction preserved for human review.
The recipe in six steps, copy-pasteable into a flow:
prebuilt-read (Path B) returns OCR text per attachmentconcat() expression merges email subject + body + PDF text into a single context stringgpt-4o-mini chat completion (Path B) extracts the 16 fieldsconfidence field against the thresholdAddRowV2 writes to the WorkOrders table on pass, or to the Exceptions table on failThe pattern only matters because real inbound work orders distribute their content unevenly. Two of the test scenarios:
Mid-build, while the helper test-sender flow was being shaken down, archived PDFs landed in SharePoint at exactly 1500 bytes each. The owner peeked into Outlook, opened one of the archives, and messaged the PM:
The PM forwarded the report and asked me to verify the file's actual bytes, not just the size. I downloaded one of the archives and checked the magic bytes:
JVBERi0x... ← base64-encoded ASCII expected: %PDF-1.4... ← actual PDF binary header
Diagnosis: the helper test-sender flow was passing the attachment as a base64 string, and SharePoint CreateFile was writing those ASCII characters straight to disk. The intake flow itself was correct. The bug was upstream, in the test fixture. Fix:
Re-test: archived PDFs are now ~85 KB each, magic bytes %PDF-1.4, openable in any reader. Lesson: for any binary attachment path in Power Automate, verify the magic bytes of the destination artifact, not just the file size or HTTP status. A 200 OK with a 1.5 KB file is the same shape as a 200 OK with a 85 KB file, and the connector will not warn you.
During the SharePoint→Excel migration, my first patch of the extraction flow was failing in the worst possible way. Power Automate run status: Succeeded. Excel returned 200 OK on every AddRowV2 call. New rows appeared in the WorkOrders table on every run. The cells were empty. No error surfaced anywhere: not in the run history, not in the connector response, not in the Excel UI. This is the failure mode that defeats top-level monitoring: the flow succeeded as a flow, the connector accepted the request, the row appeared, and the data silently went nowhere. Tool I reached for:
AddRowV2 action body, exactly as the connector received itWhat the action actually received:
{
"item": {
"Title": "WorkOrder Scenario 1 - leak under vanity",
"PropertyAddress": "42 Ocean View Road, Unit 8, Manly",
"TradeCategory": "Plumbing"
}
}
The Excel Online Business connector does not consume a nested item object. It reads dynamic top-level keys whose names follow the item/<ColumnName> convention, visible when inspecting the connector definition. My nested body was structurally valid JSON, so the connector returned 200; my fields lived under a key the connector did not look at, so the row went in blank. Fix: flatten the body to the dynamic-column shape:
{
"item/Title": "WorkOrder Scenario 1 - leak under vanity",
"item/PropertyAddress": "42 Ocean View Road, Unit 8, Manly",
"item/TradeCategory": "Plumbing"
}
Two minutes from "I notice the cells are empty" to a fix patched into the flow definition JSON, all without opening the Power Automate portal or the Excel file. Without action-level body inspection I would have spent an hour staring at the table and re-running the flow, watching new blank rows appear, with the connector and the run status both lying to me by silently agreeing. The owner asked the broader question shortly after, "how can we improve table column mapping issues", and the answer was the one already patched into the flow: read the connector swagger, write the dynamic-column form.
PropertyAddress + IssueDescription[:50], search WorkOrders for matches in the last 7 days, route to Exceptions if hitExtract text in images and PDF documents + Run a prompt in JSON output mode; current build is locked to the owner's Azure subscription, AI Builder native is portable to any Power Platform tenantThe hard part of this build was not the AI call. It was the data plumbing across boundaries: PDF binary handling at the email connector, OCR latency at Document Intelligence, JSON contract design at the model boundary, the Excel connector's undocumented dynamic-column convention. Each boundary was inspectable through action-level inputs and outputs, and that is what made the work tractable inside a single afternoon.
|
▸ section · author switch
BACK TO CATHERINE
human validation · what I checked before publishing this
|
I want to be precise about who did what here, because the orchestration matters more than I expected it to.
The worker built a helper flow that emits work-order emails with PDF attachments on demand. It used that helper to test itself end-to-end across three diverse scenarios: one with most of the detail in the email body, one with most of the detail in the PDF, one deliberately messy with information scattered across both. I did not send any of those test emails myself.
What I did do: I peeked into Outlook on my phone and confirmed the test emails arrived with their PDF attachments visible. I confirmed both flows ran in the Power Automate run history. I opened WorkOrder_Operations.xlsx on the SharePoint MCPDemo site and confirmed the extracted rows landed in the WorkOrders table with the fields populated.
WorkOrderStatus and Confidence. Test 3 (the messy one) lands at 0.9; the rest at 1.0.
That is enough for me to say the demo worked end to end. Everything in between (the combined-context concat, the system prompt, the confidence-gate Power Fx expression, the dynamic-column Excel mappings, the SharePoint→Excel migration mid-build) was the worker's work and it stands on its own merits.
The orchestration is the part I want to flag separately. I did not write a spec. I sent the Reddit screenshot and answered a handful of Telegram questions: which mailbox to watch, the SharePoint MCPDemo site as the archive target, "yes, you can use my Azure CLI session", and, about an hour into the build, "Also the data needs to go to an excel file not SharePoint list." The PM held the build in its head for the whole afternoon, asked the worker probing verification questions when results looked too clean (did you actually exercise that path, or did you just deploy it?), and pinged me only when a decision genuinely needed me. When I reported the blank 1KB PDFs, the PM did not guess; it asked the worker to verify file bytes, not file size, and the magic-bytes diagnosis followed from there.
The worker's [07] backlog is comprehensive. Two of the seven items are the ones I would not ship without if a real property maintenance company were going to rely on this.
A human-in-the-loop gate at borderline confidence. Today the flow is binary: confidence ≥ 0.75 goes to WorkOrders, anything lower to Exceptions. In a real deployment I would split the middle band (0.75–0.89) into a Power Automate Start and wait for an approval action (email approval or a Teams adaptive card showing the extracted fields next to the original email body and PDF text), and only auto-write to WorkOrders when the approver clicks Approve. The high-confidence band (≥ 0.90) still auto-creates. This costs the dispatcher a queue of perhaps five to fifteen approvals per day in exchange for catching the cases where the model was wrong and confident, which is the failure mode that destroys trust faster than anything else.
A daily 24-hour digest email. A scheduled Recurrence flow at 7am reads the WorkOrders rows where CreatedDate >= utcNow() - 24h, builds a short HTML table grouped by trade category and priority, and sends it to a dispatcher mailbox via Send an email (V2). The report itself is useful, but the real point is the early warning when zero rows arrive. If the Outlook trigger broke overnight, or AI Builder credits ran out, or Document Intelligence rate-limited, an empty digest tells the dispatcher within hours instead of days. The same pattern works for any silent-fail-prone flow: a daily heartbeat with a count is cheap insurance.
If you have the same Reddit question (scattered information across email body and PDF, "AI Builder vs Copilot Studio Work IQ vs build it yourself") the architecture above is the one I would copy: combined-context concatenation, single Azure OpenAI call with an explicit confidence field in the JSON contract, confidence-gated routing to either the operational table or an exceptions table. If you want your agent (or your agent's PM) to inspect action-level inputs and outputs the way ours did, Flow Studio MCP is what made that inspection possible. There is a free Starter plan.
If you already have an Azure OpenAI deployment and a Document Intelligence resource, the HTTP + Azure path (Path B in [01]) gives you fine-grained control of model parameters and is cheaper per call at scale. If you are starting from a Power Platform license with no Azure footprint, the AI Builder native path (Path A) is portable, requires no Azure resources, and uses one AI Builder Run a prompt action with JSON output mode that removes the Parse JSON step entirely.
Combine before extracting. Concatenate the email subject, email body, and the OCR text of every PDF attachment into one context string, then send a single prompt to the model. Do not run separate extraction passes on the email and the PDF. The model decides which field came from where, and the same prompt works whether the property name is in the email body, the PDF, or both. Code in [04].
AddRowV2 writing blank rows even though the run succeeded?The Excel Online Business connector reads dynamic top-level keys named item/<ColumnName>. A nested item: {...} object is structurally valid JSON, the connector returns 200 OK, the row appears, and every cell is empty. Flatten the body to { "item/Title": "...", "item/PropertyAddress": "..." }. Discoverable in the connector swagger. Diagnosis story in [06].
Verify the magic bytes of the destination file, not just the size. PDFs start with %PDF-1.4 (binary). If you see JVBERi0x, you wrote base64 ASCII text into a .pdf file because the upstream flow passed the attachment as a base64 string. Wrap with @base64ToBinary('...') in the Power Automate expression. Story in [05].
Top-level run status is not enough. The flow can be Succeeded while the connector silently drops your data. Inspect action-level inputs and outputs (e.g. via Flow Studio MCP's get_live_flow_run_action_outputs) to see what each connector actually received and returned. Add a daily 24-hour digest email so an empty digest becomes an early warning when the trigger or an AI dependency breaks overnight.
About Flow Studio MCP: Flow Studio MCP is a Model Context Protocol server that gives AI agents action-level visibility into Power Automate. It works with GitHub Copilot, Claude, Codex, and any MCP-compatible agent. Available on GitHub via the awesome-copilot list.
Related reading:
Catherine Han, Flow Studio