Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.box.com/llms.txt

Use this file to discover all available pages before exploring further.

Manual invoice processing is slow, error-prone, and expensive to scale. Every PDF that lands in an inbox needs someone to open it, read line items, and key data into a spreadsheet or ERP. This tutorial builds an automated intake service that eliminates that manual work. When a new PDF arrives in a designated Box folder, the service calls Box AI to extract predictable fields - vendor name, invoice number, total, dates - and writes those values back to the file as Box metadata.

What you are building

By the end of this tutorial, you have a working Python service that:
  • Checks for new files in a designated ‘Invoices Inbox’ folder using Box webhooks.
  • Calls the Box AI structured extract endpoint with a metadata template.
  • Writes the extracted key-value pairs back to the file as Box metadata.
  • Optionally logs extracted totals for downstream ERP integration.

Prerequisites

Before you start, make sure you have the following:
  • A with Box AI enabled.
  • A Box application configured with Client Credentials Grant authentication.
  • Python 3.11 or higher.
  • The following scopes enabled on your app:
    • Read all files and folders stored in Box
    • Write all files and folders stored in Box
    • Manage AI
    • Manage webhooks

Step-by-step process

This solution uses three Box Platform capabilities:
ComponentPurposeAPI
WebhooksDetect new files arriving in the inbox folderPOST /2.0/webhooks
Box AI ExtractPull structured fields from each invoice PDFPOST /2.0/ai/extract_structured
MetadataWrite extracted values back to the filePOST /2.0/files/:id/metadata/:scope/:template
1

Create the metadata template

The metadata template defines the fields Box AI extracts. Create it once, and every invoice processed by the service returns values in this shape.
This step requires Admin access. If you do not have access, contact your Box administrator.
  1. Open the Box Admin Console and select Metadata.
  2. In the Invoices tab, click New and name it Invoice.
  3. Add the following fields:
Field nameTypeDescription
Vendor NameTextThe name of the vendor or supplier issuing the invoice
Invoice NumberTextThe unique invoice identifier
Invoice DateDateThe date the invoice was issued
Due DateDateThe payment due date
Total AmountNumberThe total amount due, including taxes and fees
CurrencyTextThe three-letter currency code (for example, USD, EUR, GBP)
  1. Copy the template key from under the Template Name. You need it in a later step.
  2. Click Save.
For a detailed walkthrough, see .
2

Create the invoices inbox folder

Create a dedicated folder in Box to serve as the invoice drop zone.
  1. In Box, create a new folder called Invoices Inbox.
  2. Note the folder ID from the URL. For example, if the URL is https://app.box.com/folder/123456789, the folder ID is 123456789.
  3. Share the folder with your application’s service account. This is required because CCG applications act as a separate service account user that does not automatically have access to your content.
This step is critical. Without it, all API calls return 404 “Not found” errors.To find your service account email: go to the Developer Console, open your app, and look under General Settings for the Service Account ID (it looks like AutomationUser_xxxxx_xxxxxx@boxdevedition.com).Invite this email as a collaborator on the folder with the Editor role. Editor access is required because the app needs to write metadata back to files.
3

Set up the development environment

  1. Open your terminal and create a new project directory:
mkdir invoice-intake && cd invoice-intake
  1. Create and activate a Python virtual environment:
python3 -m venv .venv
source .venv/bin/activate
After activation, your terminal prompt shows (.venv) at the beginning. This confirms you are working inside the virtual environment.
Every time you open a new terminal window or tab, you must re-activate the virtual environment by running source .venv/bin/activate from the project directory. If you see ModuleNotFoundError when running commands, it usually means the venv is not activated.
  1. Install the required packages:
pip install box-sdk-gen flask python-dotenv
  1. Create a .env file to store your credentials then add the following content. Replace the placeholder values with your actual credentials from the Box Developer Console:
BOX_CLIENT_ID=your_client_id
BOX_CLIENT_SECRET=your_client_secret
BOX_ENTERPRISE_ID=your_enterprise_id
BOX_METADATA_TEMPLATE_KEY=your_metadata_template_key
INVOICES_FOLDER_ID=your_folder_id
Never commit .env files to version control. Add .env to your .gitignore.
Understanding environment variables: The .env file stores sensitive values (your actual credentials). Your Python code reads these values by referencing their names using os.getenv("VARIABLE_NAME"). For example, os.getenv("BOX_CLIENT_ID") looks up the value stored next to BOX_CLIENT_ID= in your .env file. When you copy the code in the following steps, keep the quoted variable names exactly as shown. Do not replace them with your actual credentials.
4

Authenticate the Box client

Create a new file called box_client.py in your project directory. Open the file and paste the following code:
import os
from dotenv import load_dotenv
from box_sdk_gen import (
    BoxClient,
    BoxCCGAuth,
    CCGConfig,
)

load_dotenv()

def get_box_client() -> BoxClient:
    config = CCGConfig(
        client_id=os.getenv("BOX_CLIENT_ID"),
        client_secret=os.getenv("BOX_CLIENT_SECRET"),
        enterprise_id=os.getenv("BOX_ENTERPRISE_ID"),
    )
    auth = BoxCCGAuth(config=config)
    return BoxClient(auth=auth)
Client Credentials Grant is recommended for server-to-server automations where no end user is present. For other authentication options, see .
5

Build the extraction function

Create a new file called extract.py and paste the following code. This is the core of the service. It takes a file ID, calls Box AI to extract fields using your metadata template, and returns the structured result.
import os
from dotenv import load_dotenv
from box_sdk_gen import (
    AiItemBase,
    BoxClient,
    CreateAiExtractStructuredMetadataTemplate,
    CreateAiExtractStructuredMetadataTemplateTypeField,
)

load_dotenv()

def extract_invoice_fields(client: BoxClient, file_id: str) -> dict:
    template_key = os.getenv("BOX_METADATA_TEMPLATE_KEY")

    result = client.ai.create_ai_extract_structured(
        items=[AiItemBase(id=file_id)],
        metadata_template=CreateAiExtractStructuredMetadataTemplate(
            template_key=template_key,
            type=CreateAiExtractStructuredMetadataTemplateTypeField.METADATA_TEMPLATE,
            scope="enterprise",
        ),
    )

    return result.to_dict()["answer"]
The metadata template tells Box AI exactly which fields to look for and what data types to return. This means the response shape is predictable and consistent, regardless of how each vendor formats their invoices.
6

Write metadata back to the file

Create a new file called metadata.py and paste the following code. After extraction, this function attaches the extracted values to the file as a metadata instance:
import os
from dotenv import load_dotenv
from box_sdk_gen import BoxClient, CreateFileMetadataByIdScope

load_dotenv()

def apply_metadata(client: BoxClient, file_id: str, metadata: dict) -> dict:
    template_key = os.getenv("BOX_METADATA_TEMPLATE_KEY")

    attached = client.file_metadata.create_file_metadata_by_id(
        file_id=file_id,
        scope=CreateFileMetadataByIdScope.ENTERPRISE,
        template_key=template_key,
        request_body=metadata,
    )

    return attached.to_dict()
Once metadata is attached, the extracted fields become searchable, filterable, and visible in the Box web app. You can use to find all invoices above a certain amount, filter by vendor, or build dashboards in Box Apps.
7

Create the webhook listener

Create a new file called app.py and paste the following code. This Flask application receives webhook notifications when new files arrive in the inbox folder:
import os
import json

from dotenv import load_dotenv
from flask import Flask, request, jsonify

from box_client import get_box_client
from extract import extract_invoice_fields
from metadata import apply_metadata

load_dotenv()
app = Flask(__name__)

@app.route("/webhook", methods=["POST"])
def handle_webhook():
    payload = request.get_json()

    if payload.get("trigger") != "FILE.UPLOADED":
        return jsonify({"status": "ignored"}), 200

    file_id = payload["source"]["id"]
    file_name = payload["source"]["name"]

    if not file_name.lower().endswith(".pdf"):
        return jsonify({"status": "skipped, not a PDF"}), 200

    client = get_box_client()

    extracted = extract_invoice_fields(client, file_id)
    print(f"Extracted from {file_name}: {json.dumps(extracted, indent=2)}")

    apply_metadata(client, file_id, extracted)
    print(f"Metadata applied to file {file_id}")

    return jsonify({"status": "processed", "file_id": file_id}), 200

if __name__ == "__main__":
    app.run(port=5000)
In production, you should verify webhook signatures to confirm that requests originate from Box. See for implementation details.
At this point, your project directory should contain the following files:
invoice-intake/
├── .env
├── .venv/
├── app.py
├── box_client.py
├── extract.py
└── metadata.py
8

Test the integration

You can test the extraction pipeline locally without setting up a public URL or webhook. This step simulates what Box would send when a new file arrives.
This step requires two terminal windows open at the same time. Terminal 1 runs the Flask server (which must stay running). Terminal 2 sends a test request to it.
Terminal 1 - start the server:Make sure you are in the invoice-intake directory and the virtual environment is activated:
cd ~/invoice-intake
source .venv/bin/activate
python3 app.py
You should see:
* Running on http://127.0.0.1:5000
Leave this terminal running.Terminal 2 - send a test request:Open a new terminal tab or window. Send a simulated webhook payload using curl. Replace <FILE_ID> with the file ID of the invoice PDF you uploaded to Box.
Use the file ID, not the folder ID. The file ID comes from the file’s URL: https://app.box.com/file/123456789 → the file ID is 123456789. The folder ID comes from a different URL pattern: https://app.box.com/folder/987654321.
curl -X POST http://127.0.0.1:5000/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "trigger": "FILE.UPLOADED",
    "source": {
      "id": "<FILE_ID>",
      "name": "sample-invoice.pdf"
    }
  }'
Check the result:Switch back to Terminal 1. You should see the extracted fields printed, followed by a confirmation that metadata was applied:
Extracted from sample-invoice.pdf: {
  "vendorName": "ACME Corp",
  "invoiceNumber": "INV-001",
  "invoiceDate": "2025-03-15T00:00:00Z",
  "dueDate": "2025-04-15T00:00:00Z",
  "totalAmount": 1250.00,
  "currency": "USD"
}
Metadata applied to file 123456789
Open the file in Box and click the Metadata tab to verify the values were written correctly.
9

Register a webhook (production)

The local curl test simulates what Box sends, but for a production deployment you need Box to send real webhook notifications automatically. This requires a publicly accessible HTTPS endpoint:
curl -X POST https://api.box.com/2.0/webhooks \
  -H "Authorization: Bearer <ACCESS_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "target": {
      "id": "<FOLDER_ID>",
      "type": "folder"
    },
    "address": "<webhook_url>",
    "triggers": ["FILE.UPLOADED"]
  }'
Replace <FOLDER_ID> with your invoices folder ID and update the address to your tunnel URL with /webhook appended.Once registered, any PDF uploaded to the folder automatically triggers extraction and metadata application.

Troubleshooting

Your virtual environment is not activated. Run source .venv/bin/activate from the project directory before running any python3 commands. Each new terminal tab needs its own activation.
Check your .env file:
  • Verify BOX_CLIENT_ID and BOX_CLIENT_SECRET match the values in Developer Console > Configuration.
  • Confirm BOX_ENTERPRISE_ID is your enterprise ID (found in Admin Console > Account & Billing, or Developer Console > icon in the top-right > Copy Enterprise ID).
  • Ensure your app is authorized in the Developer Console.
  • Make sure the app type is Client Credentials Grant.
The service account does not have access to the file or folder. Invite the service account email (found in Developer Console > General Settings) as a collaborator with Editor role on the folder containing your invoice files.
The BOX_METADATA_TEMPLATE_KEY value in your .env file is missing or empty. Add the template key you noted when creating the metadata template in Step 1.

Optional: push totals to an ERP

Once you have structured metadata, pushing data downstream is straightforward. Add an ERP integration step after metadata is applied:
def push_to_erp(extracted: dict, file_id: str):
    """Send extracted totals to your ERP system."""
    erp_payload = {
        "vendor": extracted.get("vendorName"),
        "invoice_number": extracted.get("invoiceNumber"),
        "total": extracted.get("totalAmount"),
        "currency": extracted.get("currency"),
        "due_date": extracted.get("dueDate"),
        "source_file_id": file_id,
    }
    # Replace with your ERP's API endpoint
    # requests.post("https://erp.example.com/api/invoices", json=erp_payload)
    print(f"ERP payload ready: {erp_payload}")

Scaling to production

Box webhook delivery can produce duplicates. Make your handler idempotent by checking whether metadata already exists on the file before processing. Use GET /2.0/files/:id/metadata/enterprise/:template and skip extraction if an instance is already present.
For high-volume environments, consider using instead of webhooks. Enterprise events provide a durable, polling-based stream that is better suited to batch processing thousands of invoices.
If your invoices contain complex layouts, multi-page line items, or non-standard formatting, use the for improved accuracy. Specify the agent in your extraction call:
from box_sdk_gen import AiAgentReference, AiAgentReferenceTypeField

enhanced_agent = AiAgentReference(
    id="enhanced_extract_agent",
    type=AiAgentReferenceTypeField.AI_AGENT_ID,
)

Next steps

Sales RFP answer bank

Build an AI-powered knowledge base for sales teams using Box Hubs and Box AI.

Extract API reference

See the full API specification for structured extraction.