Build an accounts payable automation that watches a Box folder for new invoices, extracts structured fields with Box AI, and writes metadata back to each file.
Use this file to discover all available pages before exploring further.
Manual invoice processing is slow, error-prone, and expensive to scale. Every PDF that lands in an inbox needs someone to open it, read line items, and key data into a spreadsheet or ERP.This tutorial builds an automated intake service that eliminates that manual work. When a new PDF arrives in a designated Box folder, the service calls Box AI to extract predictable fields - vendor name, invoice number, total, dates - and writes those values back to the file as Box metadata.
In the Invoices tab, click New and name it Invoice.
Add the following fields:
Field name
Type
Description
Vendor Name
Text
The name of the vendor or supplier issuing the invoice
Invoice Number
Text
The unique invoice identifier
Invoice Date
Date
The date the invoice was issued
Due Date
Date
The payment due date
Total Amount
Number
The total amount due, including taxes and fees
Currency
Text
The three-letter currency code (for example, USD, EUR, GBP)
Copy the template key from under the Template Name. You need it in a later step.
Click Save.
For a detailed walkthrough, see .
2
Create the invoices inbox folder
Create a dedicated folder in Box to serve as the invoice drop zone.
In Box, create a new folder called Invoices Inbox.
Note the folder ID from the URL. For example, if the URL is https://app.box.com/folder/123456789, the folder ID is 123456789.
Share the folder with your application’s service account. This is required because CCG applications act as a separate service account user that does not automatically have access to your content.
This step is critical. Without it, all API calls return 404 “Not found” errors.To find your service account email: go to the Developer Console, open your app, and look under General Settings for the Service Account ID (it looks like AutomationUser_xxxxx_xxxxxx@boxdevedition.com).Invite this email as a collaborator on the folder with the Editor role. Editor access is required because the app needs to write metadata back to files.
3
Set up the development environment
Open your terminal and create a new project directory:
mkdir invoice-intake && cd invoice-intake
Create and activate a Python virtual environment:
python3 -m venv .venvsource .venv/bin/activate
After activation, your terminal prompt shows (.venv) at the beginning. This confirms you are working inside the virtual environment.
Every time you open a new terminal window or tab, you must re-activate the virtual environment by running source .venv/bin/activate from the project directory. If you see ModuleNotFoundError when running commands, it usually means the venv is not activated.
Install the required packages:
pip install box-sdk-gen flask python-dotenv
Create a .env file to store your credentials then add the following content. Replace the placeholder values with your actual credentials from the Box Developer Console:
Never commit .env files to version control. Add .env to your .gitignore.
Understanding environment variables: The .env file stores sensitive values (your actual credentials). Your Python code reads these values by referencing their names using os.getenv("VARIABLE_NAME"). For example, os.getenv("BOX_CLIENT_ID") looks up the value stored next to BOX_CLIENT_ID= in your .env file. When you copy the code in the following steps, keep the quoted variable names exactly as shown. Do not replace them with your actual credentials.
4
Authenticate the Box client
Create a new file called box_client.py in your project directory. Open the file and paste the following code:
Client Credentials Grant is recommended for server-to-server automations where no end user is present. For other authentication options, see .
5
Build the extraction function
Create a new file called extract.py and paste the following code. This is the core of the service. It takes a file ID, calls Box AI to extract fields using your metadata template, and returns the structured result.
The metadata template tells Box AI exactly which fields to look for and what data types to return. This means the response shape is predictable and consistent, regardless of how each vendor formats their invoices.
6
Write metadata back to the file
Create a new file called metadata.py and paste the following code. After extraction, this function attaches the extracted values to the file as a metadata instance:
Once metadata is attached, the extracted fields become searchable, filterable, and visible in the Box web app. You can use to find all invoices above a certain amount, filter by vendor, or build dashboards in Box Apps.
7
Create the webhook listener
Create a new file called app.py and paste the following code. This Flask application receives webhook notifications when new files arrive in the inbox folder:
You can test the extraction pipeline locally without setting up a public URL or webhook. This step simulates what Box would send when a new file arrives.
This step requires two terminal windows open at the same time. Terminal 1 runs the Flask server (which must stay running). Terminal 2 sends a test request to it.
Terminal 1 - start the server:Make sure you are in the invoice-intake directory and the virtual environment is activated:
cd ~/invoice-intakesource .venv/bin/activatepython3 app.py
You should see:
* Running on http://127.0.0.1:5000
Leave this terminal running.Terminal 2 - send a test request:Open a new terminal tab or window. Send a simulated webhook payload using curl. Replace <FILE_ID> with the file ID of the invoice PDF you uploaded to Box.
Use the file ID, not the folder ID. The file ID comes from the file’s URL: https://app.box.com/file/123456789 → the file ID is 123456789. The folder ID comes from a different URL pattern: https://app.box.com/folder/987654321.
Open the file in Box and click the Metadata tab to verify the values were written correctly.
9
Register a webhook (production)
The local curl test simulates what Box sends, but for a production deployment you need Box to send real webhook notifications automatically. This requires a publicly accessible HTTPS endpoint:
Replace <FOLDER_ID> with your invoices folder ID and update the address to your tunnel URL with /webhook appended.Once registered, any PDF uploaded to the folder automatically triggers extraction and metadata application.
Your virtual environment is not activated. Run source .venv/bin/activate from the project directory before running any python3 commands. Each new terminal tab needs its own activation.
invalid_client: The client credentials are invalid
Check your .env file:
Verify BOX_CLIENT_ID and BOX_CLIENT_SECRET match the values in Developer Console > Configuration.
Confirm BOX_ENTERPRISE_ID is your enterprise ID (found in Admin Console > Account & Billing, or Developer Console > icon in the top-right > Copy Enterprise ID).
Ensure your app is authorized in the Developer Console.
Make sure the app type is Client Credentials Grant.
404 Not Found
The service account does not have access to the file or folder. Invite the service account email (found in Developer Console > General Settings) as a collaborator with Editor role on the folder containing your invoice files.
metadata_template must have both scope and template_key
The BOX_METADATA_TEMPLATE_KEY value in your .env file is missing or empty. Add the template key you noted when creating the metadata template in Step 1.
Box webhook delivery can produce duplicates. Make your handler idempotent by checking whether metadata already exists on the file before processing. Use GET /2.0/files/:id/metadata/enterprise/:template and skip extraction if an instance is already present.
Process at scale with event streams
For high-volume environments, consider using instead of webhooks. Enterprise events provide a durable, polling-based stream that is better suited to batch processing thousands of invoices.
Use the Enhanced Extract Agent for complex invoices
If your invoices contain complex layouts, multi-page line items, or non-standard formatting, use the for improved accuracy. Specify the agent in your extraction call:
from box_sdk_gen import AiAgentReference, AiAgentReferenceTypeFieldenhanced_agent = AiAgentReference( id="enhanced_extract_agent", type=AiAgentReferenceTypeField.AI_AGENT_ID,)