Smart Field Configuration: Eliminate Ambiguity in Data Extraction

The definition of data fields is an important step in the creation of a Custom Extraction Model. Data fields - even if they belong to the same type - may require different formats and configurations. With our Smart Field Configuration, you can define all the details and achieve precise results.

Custom Extraction - Field Configuration
Poorly defined data fields can lead to inaccurate extraction results – especially when dealing with numbers, date formats, or complex hierarchies. Even within the same field type, small differences in structure can cause big issues.
 
Our Smart Field Configuration solves this problem by giving you full control: define numbers, date formats, and line items structures exactly the way you need.
 
In this tutorial, we’ll show you how to use Field Configuration to eliminate ambiguity and ensure your Custom Extraction Model delivers precise, reliable results.

Your Benefits With Field Configurations

Our new Smart Field Configuration Feature makes defining and managing data fields easier than ever. With extensive data field explanations, you gain a deeper understanding of each field’s purpose and functionality.

Extended configuration settings allow for precise adjustments to meet your specific needs, while the simplified creation process ensures a more efficient workflow. These enhancements help you achieve greater accuracy and flexibility in your extraction models.

Optional Numbers Settings

With the Fallback Decimal Separator you can now determine how numbers with separators are interpreted when they are ambiguous – whether they should be treated as decimal numbers or thousands. There is no universal standard worldwide.

You can currently choose between these two formats:

Comma (,) → Used in many European countries (e.g., 3,14)
Dot (.) → Common in English-speaking countries (e.g., 3.14)
Note: If at least one number with separators is clearly present in your documents, we will learn from it and correctly interpret ambiguous numbers as well.
In this case, the number 1,000.50 appeared on one page. Based on the previous explanation, we know that the comma is used as a thousand separator. We remember this pattern, meaning the number on the right is now unambiguous, even without additional settings (Memory Mechanism).

Adjusting Date Formats

The Priority Rule for Date Interpretation determines how the model interprets a date in cases of ambiguity. Since there is no universal standard for date formats worldwide, the system follows predefined rules to convert dates into a standardized form.

You can currently choose between these three formats:

Day First (DD.MM.YYYY) → E.g., 25.04.2025
Month First (MM/DD/YYYY) → E.g., 04/25/2025
Year First (YYYY-MM-DD) → E.g., 2025-04-25

This date is a clear example. There is no 25th month, so this must be the day.
For numbers up to 12, interpretation can be challenging:
It’s unclear whether they represent the day or the month. This creates ambiguity. For example, with „03/12/2025″, is it December 3rd or March 12th? To address such ambiguous cases, you can explicitly specify the Fallback Date Format for your documents to ensure accurate interpretation.

Note: If at least one date is clear on your documents, we will learn from it and correctly interpret ambiguous dates as well.
In this case, the unambiguous date 25/04/2025 appeared on a document.
This clarity is recognized, and the system learns that dates typically start with the day. It then applies this knowledge to ambiguous dates as well, even without additional settings (Memory Mechanism).

Define Hierarchical Line Items

Imagine you have a list where „Delivery Number 1020“ is written once, and below it, there are items like „Cement Bags,“ „Sand,“ and „Gravel.“ The problem is that the model doesn’t automatically know that these items belong to „Delivery 1020“ because the number isn’t repeated in each row.
The Hierarchical Items setting would be in this case:

– The Delivery Number applies to multiple items.
– It should be copied downward to each related item.
So instead of requiring the Delivery Number in every row, the model understands that everything under „Delivery 1020“ belongs to it – until a new Delivery Number appears, and the process starts over.

Detecting Line Items Across Pages

Imagine you have an item description that starts on page 1 and continues onto page 2. There is no automatic recognition that it’s the same item, so it is treated as two separate entries.
This setting resolves the issue by using a unique identifier that links related items together, such as an Article Number.

The setting would be in this case:

– „Article Number 1″ on page 1 will be identified.
– On page 2, a text appears without a new Article Number.
– With this setting, the text is correctly linked to „Article Number 1″.
Without this setting, the second part on page 2 would be treated as another list element without an Article Number, containing only a description. Using a unique identifier ensures that content stays grouped correctly, even if it spans multiple pages.

Differences between pages:

Another scenario involves slight differences between pages. Example:
– Page 1 contains “Article Number”  and “Description”
– Page 2 contains “Article Number”, “Description” and “Price”.

Although this looks like a continuation of the same list, the system treats it as two separate lists because the “Article Number” appears again on page 2 and is configured as a unique identifier.

In such cases, the entries would only be merged if the logic allowed repetition of certain fields (e.g. “Price”), while still treating “Article Number” as the primary anchor.

This highlights the importance of how different field settings interact when identifying list continuity across pages.

Create Your Custom Model Now

Our new Smart Field Configuration empowers you to create your perfect extraction models with ease. With more flexibility and control, you can tailor data fields to your specific needs and improve accuracy effortlessly.

Now it’s time to put these features into action – Start exploring the new feature today and create your custom extraction model to achieve even more precise and efficient results!
Share Post