What you are building
By the end of this tutorial, you have a working Python service that:- Checks for new files in a designated ‘Invoices Inbox’ folder using Box webhooks.
- Calls the Box AI structured extract endpoint with a metadata template.
- Writes the extracted key-value pairs back to the file as Box metadata.
- Optionally logs extracted totals for downstream ERP integration.
Prerequisites
Before you start, make sure you have the following:- A with Box AI enabled.
- A Box application configured with Client Credentials Grant authentication.
- Python 3.11 or higher.
- The following scopes enabled on your app:
- Read all files and folders stored in Box
- Write all files and folders stored in Box
- Manage AI
- Manage webhooks
Step-by-step process
This solution uses three Box Platform capabilities:| Component | Purpose | API |
|---|---|---|
| Webhooks | Detect new files arriving in the inbox folder | POST /2.0/webhooks |
| Box AI Extract | Pull structured fields from each invoice PDF | POST /2.0/ai/extract_structured |
| Metadata | Write extracted values back to the file | POST /2.0/files/:id/metadata/:scope/:template |
Create the metadata template
- Open the Box Admin Console and select Metadata.
- In the Invoices tab, click New and name it
Invoice. - Add the following fields:
| Field name | Type | Description |
|---|---|---|
| Vendor Name | Text | The name of the vendor or supplier issuing the invoice |
| Invoice Number | Text | The unique invoice identifier |
| Invoice Date | Date | The date the invoice was issued |
| Due Date | Date | The payment due date |
| Total Amount | Number | The total amount due, including taxes and fees |
| Currency | Text | The three-letter currency code (for example, USD, EUR, GBP) |
- Copy the template key from under the Template Name. You need it in a later step.
- Click Save.
Create the invoices inbox folder
- In Box, create a new folder called
Invoices Inbox. - Note the folder ID from the URL. For example, if the URL is
https://app.box.com/folder/123456789, the folder ID is123456789. - Share the folder with your application’s service account. This is required because CCG applications act as a separate service account user that does not automatically have access to your content.
Set up the development environment
- Open your terminal and create a new project directory:
- Create and activate a Python virtual environment:
(.venv) at the beginning. This confirms you are working inside the virtual environment.source .venv/bin/activate from the project directory. If you see ModuleNotFoundError when running commands, it usually means the venv is not activated.- Install the required packages:
- Create a
.envfile to store your credentials then add the following content. Replace the placeholder values with your actual credentials from the Box Developer Console:
.env file stores sensitive values (your actual credentials). Your Python code reads these values by referencing their names using os.getenv("VARIABLE_NAME"). For example, os.getenv("BOX_CLIENT_ID") looks up the value stored next to BOX_CLIENT_ID= in your .env file. When you copy the code in the following steps, keep the quoted variable names exactly as shown. Do not replace them with your actual credentials.Authenticate the Box client
box_client.py in your project directory. Open the file and paste the following code:Build the extraction function
extract.py and paste the following code. This is the core of the service. It takes a file ID, calls Box AI to extract fields using your metadata template, and returns the structured result.Write metadata back to the file
metadata.py and paste the following code. After extraction, this function attaches the extracted values to the file as a metadata instance:Create the webhook listener
app.py and paste the following code. This Flask application receives webhook notifications when new files arrive in the inbox folder:Test the integration
invoice-intake directory and the virtual environment is activated:<FILE_ID> with the file ID of the invoice PDF you uploaded to Box.Register a webhook (production)
<FOLDER_ID> with your invoices folder ID and update the address to your tunnel URL with /webhook appended.Once registered, any PDF uploaded to the folder automatically triggers extraction and metadata application.Troubleshooting
ModuleNotFoundError: No module named '...'
ModuleNotFoundError: No module named '...'
source .venv/bin/activate from the project directory before running any python3 commands. Each new terminal tab needs its own activation.invalid_client: The client credentials are invalid
invalid_client: The client credentials are invalid
.env file:- Verify
BOX_CLIENT_IDandBOX_CLIENT_SECRETmatch the values in Developer Console > Configuration. - Confirm
BOX_ENTERPRISE_IDis your enterprise ID (found in Admin Console > Account & Billing, or Developer Console > icon in the top-right > Copy Enterprise ID). - Ensure your app is authorized in the Developer Console.
- Make sure the app type is Client Credentials Grant.
404 Not Found
404 Not Found
metadata_template must have both scope and template_key
metadata_template must have both scope and template_key
BOX_METADATA_TEMPLATE_KEY value in your .env file is missing or empty. Add the template key you noted when creating the metadata template in Step 1.Optional: push totals to an ERP
Once you have structured metadata, pushing data downstream is straightforward. Add an ERP integration step after metadata is applied:Scaling to production
Handle duplicate deliveries
Handle duplicate deliveries
GET /2.0/files/:id/metadata/enterprise/:template and skip extraction if an instance is already present.Process at scale with event streams
Process at scale with event streams
Use the Enhanced Extract Agent for complex invoices
Use the Enhanced Extract Agent for complex invoices
