Invoice Data Extraction from PDF4 nodes
FeaturedData#Invoice#PDF#LlamaParse#Agent#Data Extraction#Accounting
Invoice Data Extraction from PDF
An AI agent calls LlamaParse to parse a PDF invoice, then returns structured JSON with vendor, amount, line items, and totals.
Workflow at a glance
The full canvas, before you import it
Click any node to see its config.
#Invoice#PDF#LlamaParse#Agent#Data Extraction#Accounting
Click a node to select it — same as the Heym editor; the panel shows its settings.
4 nodes · Free & source-available
Invoice Data Extraction from PDF
Stop copying invoice data by hand. Provide a PDF URL and the InvoiceExtractor agent calls the LlamaParse tool to get clean markdown, then returns a structured JSON object with every invoice field — ready for QuickBooks, Xero, or a DataTable.
What this workflow does
- InvoiceURL — provide the PDF URL (paste it in or swap for a Webhook trigger)
- InvoiceExtractor — agent receives the URL, calls the llamaParseAPI tool, and returns structured JSON
- llamaParseAPI — HTTP tool node: agent POSTs the PDF URL to LlamaParse and gets back clean markdown
- InvoiceData — output with structured invoice JSON (vendor, number, date, total, line items)
Use cases
- Automated AP data entry from emailed PDF invoices
- Invoice pre-processing before uploading to accounting software
- Batch ingestion of historical invoices into a DataTable
Setup
- Open the llamaParseAPI node, replace
YOUR_LLAMAPARSE_KEYin the curl command with your real key from cloud.llamaindex.ai. - Open InvoiceExtractor and connect an OpenAI-compatible credential.
- Run once with the sample URL — the agent calls LlamaParse and returns a JSON invoice object in the output panel.
Notes
- LlamaParse handles scanned PDFs with OCR automatically.
- Extend the JSON schema in the agent's system instruction to capture PO numbers, tax IDs, or additional line-item fields.
- For high volume, add a Loop upstream and pass each PDF URL through the same workflow.
How to import this template
- 1Click Import → Copy JSON on this page.
- 2Open your Heym and navigate to a workflow canvas.
- 3PressCmd+V/Ctrl+V— nodes appear instantly.
- 4Add your API keys in the node config panels and click Run.
More workflow templates
Discover more automations
- DataJSONPlaceholder User ProfileLoad a sample user record from JSONPlaceholder — handy for prototyping Set/Mapper nodes and mock APIs.
- DataSet — Field RemapMap incoming text into named fields with the Set node before handing off to webhooks or databases.
- DataBatch URL FetcherIterate over a JSON array of URLs with the Loop node, fetch each via HTTP, and merge all responses into one payload.
- DataGrist to BigQuery Sync LogRead qualified rows from Grist, stream them into BigQuery, and log the sync outcome in a Heym DataTable.
- DataCRM Contact Sync to DataTablePull contacts from your CRM on a daily schedule, normalise field names to a standard schema, and upsert each record into a Heym DataTable for downstream sales and ops workflows.
- DataSEO Keyword Opportunity FinderFetch low-hanging keyword data from Google Search Console weekly, score each query for content opportunity with AI, and save high-value targets to a DataTable for your editorial team.
- DataYouTube Channel RSS to CSV — SKILL.md & Python Tutorial TrackerFetch any YouTube channel's public RSS feed and export video titles, URLs, and publish dates as a clean CSV — great for tracking SKILL.md and Python tutorial creators like Nate Herk.
- AIBatch LLM Status TrackerSend an array through the OpenAI Batch API, branch on live status updates, and collect the final per-item results.