Defining Data Fields Precisely for Effective Document Extraction

To ensure that our AI knows exactly what content it should extract from your documents, you need to clearly define the relevant data fields. This is particularly important if you want to train your own extraction model on our platform.

When processing invoices, bank statements, delivery notes and other documents, it is important to precisely extract the content you need. Accurate extraction ensures not only correct data processing but also effective use and analysis for various business processes.

You need to clearly define the relevant data fields for our AI to accurately extract the desired content from your documents. This is especially important if you want to train your own extraction model on our platform.

What are Data Fields?

Data fields are specific pieces of information or data extracted from documents. For example, this could be the invoice number, customer details or all payment details.
 
When you create your own extraction model, you can specify which data fields the AI should extract from your documents. This allows you to filter out exactly the information you need for your business processes.
 
 We distinguish between Basic Types and Advanced Types.

Basic Types

Basic types are data fields that contain a single value, such as a name or date. The basic types are free text, number, date and identifier.

Free Text

Free Text extracts a string of text consisting of one or more words and can be used to capture information such as the recipient’s name, street address and description of a service.

Number

Number extracts a numeric value from your document, i.e. it extracts numeric values such as age, weight, number of items or price.

Date

Date extracts date information from your document based on the extraction format ‘2025-01-01’. This allows information such as date of birth, delivery date or document date to be recorded in a structured manner. This function facilitates the chronological classification and organisation of documents by allowing the clear and systematic recording of time data.

Identifier

Identifier extracts alphanumeric combinations from your documents, useful for capturing unique identifiers like tax IDs or personnel numbers.

Advanced Types

Advanced types form a category to which several basic types belong. This allows data fields to be arranged hierarchically. For example, an advanced type may consist of the Basic types Name (Free text) and Age (Number). The advanced types are divided into ‘Combined’, ‘List’ and ‘Table’.

Combined

Combined combines different Basic Fields to group multiple pieces of information, allowing structured collection and processing of related data.
For example, the name (Free Text), address (Free Text) and personnel number (Identifier) can be grouped together under personal details.

The net amount (Number), the gross amount (Number) and the tax amount (Number) should be summarised under payment amounts.

List

List allows you to organize a list of data, either individually or in combination. This is particularly useful for recording and analyzing data that is available in a particular order or as a group.
Examples include order numbers (Identifiers), IBANs (Identifiers) and items extracted using Combined.

Table

Table is for tabular data and provides a structured view with columns and rows. This feature provides an accurate and clear view of data that is available in a tabular format.
Examples include items on invoices with different columns, meter readings on heating bills and individual items on payslips.
Now you are ready to extract the information from your documents in a precise and efficient way and to use it for your business processes.

By clearly defining data fields, you lay the foundation for successful processing and analysis of your documents. Get started now and improve your workflows with accurate extraction of relevant data.