Training Metrics: Analyse the Results of Your Custom Extraction Workflow

Training a custom extraction model is only half the journey - understanding how well it performs is just as important. In this tutorial, we’ll walk through how to properly analyze your model’s training results and break down what the key metrics actually mean. By learning how to interpret these metrics, you’ll be able to identify weaknesses, and take practical steps to improve your model’s accuracy.

trainingsmetriken-analysenergebnis-1
Analysing training results is key to understanding where your AI model excels, where it needs improvement and how you can optimize it. In this tutorial, we’ll guide you step by step through reviewing your training metrics and improving your model’s accuracy.

Start on the Workflow Dashboard

We start on the Workflow Dashboard of our Custom Extraction. In the section “Training Status”, you can already get a quick overview of your model’s performance.

To dive deeper, select “View Trainings”. Here you can access all current and historical training metrics, helping you track your model’s progress. Remember, the model should improve with each new round of reviewed or added data.
training-metrics-results-analysis-1

Your Model’s Overall Accuracy

The Overall Accuracy indicates how often the model correctly predicts the data fields in your documents. This tells you how many text boxes were correctly identified by the AI.
In our model we have an overall accuracy of 93% – this means is still needs some improvement.

Note: For more details, you can click on the question mark symbols next to each metric.
training-metrics-results-analysis-2

Review Template Metrics

The first step to see where we can improve the model is to check the metrics of each template – in case you separated your data into templates. Maybe only one template is performing badly then we can focus first on improving this one.

In our model, the template „Bank A“ has an overall accuracy of 85%. This means we need to have a closer look at this template.

When clicking on the eye-icon we can see the different metrics to all the data fields within this template we created for the model.
training-metrics-results-analysis-3

Understanding The Metrics

In the overview you can find various metrics that indicate different performances of your AI model.

Number of Boxes:
Indicates how many bounding boxes of this type were in the validation set. If your metrics are low even though there are many boxes, it indicates that this type really works badly. If the number of boxes is low and the metrics are low, it might just be a coincidance. In this case, try to add more samples containing this type.

Recall:
Of all the text boxes that should have been assigned this type, the recall tells you how many times the model actually did so.

F1 Score:
Combines both precision and recall to give an overall measure of a model’s performance. It is the harmonic mean of precision and recall, which takes into account both metrics equally.

Precision:
Tells you how often the model is correct when it assigns a type to a text box.

In our metrics we can see that the “Bank name” data field shows quite low results. The AI seems to have some problems recognizing this type. To improve the model, we can now check the training data for template Bank A and check the “Bank name” data field there.

But let’s first have a look at the overall performance of this data field.
training-metrics-results-analysis-4

Check Field Performance Across Templates

To do so, we click on the button “See Aggregated Detailed Metrics”.
Here we can see that the overall performance of the “Bank name” is also very low with only 65%.
training-metrics-results-analysis-5
training-metrics-results-analysis-6
If we scroll down to the bottom, we can find the most common errors, which show us which data fields were most often confused. To get more detailed information, we can consult the Confusion Matrix.

View Confusion Matrix

The Confusion Matrix shows the model’s performance by comparing its predicted outcomes against the actual true outcomes.
When we have a closer look at the “Bank name”, we can see that the AI apparently has problems distinguishing the this field from “None.”

Only 65% of the “Bank name” fields are recognized as such.
training-metrics-results-analysis-7
training-metrics-results-analysis-8

Improve Your Model

To improve the accuracy we can review the training data for the relevant template and check the bad performing data field and upload more training data if needed.

To review your training data go to the “Training Data” tab and select “Review Training Data” in Template Bank A.
Go through the documents and check the annotations for the field “Bank name”:
If this field is incorrectly marked, click on it and assign the correct text boxes. Save your changes and repeat for all documents.
training-metrics-results-analysis-9
If fields are consistently correct, consider uploading additional training data to give the AI more examples.

Once the review is complete, start a new training session. The model’s performance should improve. Repeat the steps as needed until your model reaches the desired accuracy.

Now It’s Your Turn

Review your training results, improve your model, and contact our AI experts whenever you need support. With careful review and iteration, your Custom Extraction Workflow can reach its full potential!
Share Post