Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developer.box.com/llms.txt

Use this file to discover all available pages before exploring further.

With Box AI API, you can extract metadata from the provided file and get the result in the form of key-value pairs. As input, you can either create a structure using the fields parameter, or use an already defined metadata template. To learn more about creating templates, see Creating metadata templates in the Admin Console or use the . You can also autofill metadata in templates using our Standard or Enhanced Extraction Agent.

Supported file formats

The endpoint supports the following file formats:
  • PDF
  • DOC
  • DOCX
  • GDOC
  • ODT
  • Box Note
  • TEXT
  • RTF
  • XDW
  • AS
  • TIFF
  • TIF
  • PNG
  • JPEG
  • JPG
  • WEBP
  • PPT
  • PPTX
  • GSLIDE
  • GSLIDES
  • ODP
  • OTP
  • XLS
  • XLSX
  • XLSM
  • ODS
  • CSV
  • Languages: .js, .py, .css, .php, .sql
  • JSON
  • HTML
  • XML
  • MD
Box AI automatically applies optical character recognition (OCR) when processing image files (TIFF, PNG, JPEG) and scanned documents. This eliminates the need to convert images to PDF before extraction, saving time and simplifying your integration.

Supported languages

Box AI can extract metadata from documents in the following languages:
  • English
  • Japanese
  • Chinese
  • Korean
  • Cyrillic-based languages (such as Russian, Ukrainian, Bulgarian, and Serbian)
No additional configuration is required to use different languages or image formats. Box AI automatically detects the language and applies OCR when needed.

Before you start

Make sure you followed the steps listed in to create a platform app and authenticate.

Send a request

To send a request, use the POST /2.0/ai/extract_structured endpoint.
curl -i -L 'https://api.box.com/2.0/ai/extract_structured' \
     -H 'content-type: application/json' \
     -H 'authorization: Bearer <ACCESS_TOKEN>' \
     -d '{
        "items": [
          {
            "id": "12345678",
            "type": "file",
            "content": "This is file content."
          }
        ],
        "metadata_template": {
            "template_key": "",
            "type": "metadata_template",
            "scope": ""
        },
        "fields": [
            {
              "key": "name",
              "description": "The name of the person.",
              "displayName": "Name",
              "prompt": "The name is the first and last name from the email address.",
              "type": "string",
              "options": [
                {
                  "key": "First Name"
                },
                {
                  "key": "Last Name"
              ]
            }
        ],
        "ai_agent": {
          "type": "ai_agent_extract_structured",
          "long_text": {
            "model": "azure__openai__gpt_4o_mini"
            },
          "basic_text": {
            "model": "azure__openai__gpt_4o_mini"
         }
      }
   }'

Parameters

To make a call, you must pass the following parameters. Mandatory parameters are in bold. The items array must contain exactly one element. For prompt and file limits, see .
ParameterDescriptionExample
metadata_templateThe metadata template containing the fields to extract. For your request to work, you must provide either metadata_template or fields, but not both.
metadata_template.typeThe type of metadata template.metadata_template
metadata_template.scopeThe scope of the metadata template that can either be global or enterprise. Global templates are those available to any Box enterprise, whereas enterprise templates are bound to a specific enterprise.metadata_template
metadata_template.template_keyThe name of your metadata template.invoice
items.idBox file ID of the document. The ID must reference an actual file with an extension.1233039227512
items.typeThe type of the supplied input.file
ai_agentOverride the default model configuration. Lets you change the model, prompt template, system message, or LLM parameters. See for how it works and for examples.
include_confidence_scoreA flag to indicate whether to include the confidence score for every extracted field.true
include_referenceA flag to indicate whether to include references for every extracted field.true
items.contentThe content of the item, often the text representation.This article is about Box AI.
fields.descriptionA description of the field.The person's name.
fields.displayNameThe display name of the field.Name
fields.keyA unique identifier for the field.name
fields.namespaceThe namespace of the taxonomy source. Required if using taxonomy type field from an existing taxonomy.string
fields.optionsA list of options for this field. This is most often used in combination with the enum and multiSelect field types.[{"key":"First Name"},{"key":"Last Name"}]
fields.options.keyA unique identifier for the field.First Name
fields.promptAdditional context about the key (identifier) that can include how to find and format it.Name is the first and last name from the email address
fields.typeThe type of the field. It includes but is not limited to string, float, date, enum, multiSelect, struct, table.string
fields.taxonomy_keyThe identifier for a taxonomy, which corresponds to the key of the taxonomy source. Required if using taxonomy type field.string

struct and table field types

The Box AI extract_structured API supports two complex field types — struct and table in addition to the existing scalar types (string, float, date, enum, multiSelect). The struct and table types allow you to extract grouped and repeating structured data from documents.
For best results, use the enhanced extract agent.

struct field type

Use the struct type to group multiple related sub-fields into a single named JSON object. This is useful when you want to extract a set of related values that belong together and receive them as one structured object rather than separate flat fields. Example: an address or a person’s contact details. A struct field requires a fields array that defines its sub-fields. Each sub-field is an object with the following properties:
  • key: The unique identifier for the sub-field.
  • type: The type of the sub-field. Supported types are string, text, number, float, boolean, date, enum, multiSelect, and array[<simple_type>] (e.g. array[string]). Nested struct or table types are not supported as sub-fields.
  • displayName: The display name of the sub-field.
  • description: A description of the sub-field.
  • prompt: Additional context about the sub-field that can include how to find and format it.
You can add a prompt at the struct field level when instructions apply to the whole grouped object.
The output is a single JSON object containing the extracted sub-field values. Example request for the struct field type
{
  "fields": [
    {
      "key": "address",
      "displayName": "Address",
      "type": "struct",
      "fields": [
        { "key": "street_name", "type": "string" },
        { "key": "home_number", "type": "string" },
        { "key": "postal_code", "type": "string" },
        { "key": "city", "type": "string" }
      ]
    }
  ]
}
Response:
{
  "answer": {
    "address": {
      "street_name": "Main St",
      "home_number": "123",
      "postal_code": "94105",
      "city": "San Francisco"
    }
  }
}

table field type

Use the table type to extract repeating rows of structured data as an array of JSON objects, where each object represents one row. This is useful when a document contains multiple instances of the same data structure, for example: line items in an invoice or entries in a tax table. A table field requires a fields array that defines the columns (sub-fields) of each row. The sub-field properties and supported types are identical to those of struct.
Table extraction is not limited to visually formatted tables. The table type correctly extracts repeating data whether it appears as a grid, key-value pairs, a form layout, or plain prose.
The output is an array of JSON objects, where each object represents one extracted row. Example request for the table field type
{
  "fields": [
    {
      "key": "line_items",
      "displayName": "Line Items",
      "type": "table",
      "fields": [
        { "key": "description", "type": "string" },
        { "key": "quantity", "type": "float" },
        { "key": "amount", "type": "float" }
      ]
    }
  ]
}
Response:
{
  "answer": {
    "line_items": [
      { "description": "Desk", "quantity": 2.0, "amount": 399.99 },
      { "description": "Chair", "quantity": 4.0, "amount": 149.99 }
    ]
  }
}

Supported sub-field types

The following types are supported within both struct and table fields.
TypeNotes
stringScalar or array[string]
textScalar or array[text]
numberScalar or array[number]
floatScalar or array[float]
booleanScalar or array[boolean]
dateScalar or array[date]
enumSingle occurrence only
multiSelectSingle occurrence only
Nested struct and table types are not supported as sub-fields.

Use cases

This example shows you how to extract metadata from a sample invoice in a structured way. Let’s assume you want to extract the vendor name, invoice number, and a few more details.
sample invoice

Create the request

To get the response from Box AI, call POST /2.0/ai/extract_structured endpoint with the following parameters:
  • items.type and items.id to specify the file to extract the data from.
  • fields to specify the data that you want to extract from the given file.
  • metadata_template to supply an already existing metadata template.
You can use either fields or metadata_template to specify your structure, but not both.

Use fields parameter

The fields parameter allows you to specify the data you want to extract. Each fields object has a subset of parameters you can use to add more information about the searched data. For example, you can add the field type, description, or even a prompt with some additional context.
curl --location 'https://api.box.com/2.0/ai/extract_structured' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <ACCESS_TOKEN>'' \
--data '{
    "items": [
        {
            "id": "1517628697289",
            "type": "file"
        }
    ],
    "fields": [
        {
            "key": "document_type",
            "type": "enum",
            "prompt": "what type of document is this?",
            "options": [
                {
                    "key": "Invoice"
                },
                {
                    "key": "Purchase Order"
                },
                {
                    "key": "Unknown"
                }
            ]
        },
        {
            "key": "document_date",
            "type": "date"
        },
        {
            "key": "vendor",
            "description": "The name of the entity.",
            "prompt": "Which vendor is sending this document.",
            "type": "string"
        },
        {
            "key": "document_total",
            "type": "float"
        }
    ]
  }'
The response lists the specified fields and their values:
{
    "document_date": "2024-02-13",
    "vendor": "Quasar Innovations",
    "document_total": $1050,
    "document_type": "Purchase Order"
}

Use metadata template

If you prefer to use a metadata template, you can provide its template_key, type, and scope.
curl --location 'https://api.box.com/2.0/ai/extract_structured' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <ACCESS_TOKEN>' \
--data '{
    "items": [
        {
            "id": "1517628697289",
            "type": "file"
        }
    ],
    "metadata_template": {
        "template_key": "rbInvoicePO",
        "type": "metadata_template",
        "scope": "enterprise_1134207681"
    }
}'
The response lists the fields included in the metadata template and their values:
{
  "documentDate": "February 13, 2024",
  "total": "$1050",
  "documentType": "Purchase Order",
  "vendor": "Quasar Innovations",
  "purchaseOrderNumber": "003"
}

Enhanced Extract Agent

To use the Enhanced Extract Agent, specify the ai_agent object as follows:
{
  "ai_agent": {
    "type": "ai_agent_id", 
    "id": "enhanced_extract_agent"
  }
}
To extract data using the Enhanced Extract Agent you need one of the following: See the sample code snippet using Box Python SDK:
from box_sdk_gen import (
    AiAgentReference,
    AiAgentReferenceTypeField,
    AiItemBase,
    AiItemBaseTypeField,
    BoxClient,
    BoxCCGAuth,
    CCGConfig,
    CreateAiExtractStructuredMetadataTemplate
)

# Create your client credentials grant config from the developer console
ccg_config = CCGConfig(
    client_id="my_box_client_id", # replace with your client id
    client_secret="my_box_client_secret", # replace with your client secret
    user_id="my_box_user_id", # replace with the box user id that has access
                              # to the file you are referencing
)
auth = BoxCCGAuth(config=ccg_config)
client = BoxClient(auth=auth)
# Create the agent config referencing the enhanced extract agent
enhanced_extract_agent_config = AiAgentReference(
    id="enhanced_extract_agent",
    type=AiAgentReferenceTypeField.AI_AGENT_ID
)
# Use the Box SDK to call the extract_structured endpoint
box_ai_response = client.ai.create_ai_extract_structured(
    # Create the items array containing the file information to extract from
    items=[
        AiItemBase(
            id="my_box_file_id", # replace with the file id
            type=AiItemBaseTypeField.FILE
        )
    ],
    # Reference the Box Metadata template 
    metadata_template=CreateAiExtractStructuredMetadataTemplate(
        template_key="InvoicePO",
        scope="enterprise"
    ),
    # Attach the agent config you created earlier
    ai_agent=enhanced_extract_agent_config,
)
print(f"box_ai_response: {box_ai_response.answer}")

Tutorial: Automate invoice intake with Box AI Extract

See structured extraction in action. Build an end-to-end automation that watches a folder, extracts invoice fields, and writes metadata back to each file.