This is How You Can Extract all the Information on Your Invoices

Our generic invoice model has been trained to extract as much information as possible. However, it can always happen that your invoices contain content for which our model has not been explicitly trained. To ensure that you can extract all the information that is relevant to you, we have a solution for this.

At natif.ai, we have a great invoice extraction model that you can use to extract the relevant information you might need for your downstream analysis. We also enable you to fine-tune this generic invoice extraction model on your own data to squeeze out even more performance. We trained the model to extract as much information as possible, however, because perfection is not attainable, we are aware that your data might have information that our model was not trained to extract! 
We understand how important it is to extract all the information you need from your documents. Therefore, an incoming solution would be to enable you to further train the model to extract your specific information.
However, until this is ready, we want to show you a temporary solution for how to use our APIs to extract all the information you need.

To do so, we will be using the two APIs, invoice extraction model and train your custom extraction model. The idea is to get the information that can be extracted from the excellent invoice extraction model, and for whatever information is not extracted, we train a custom extraction model to extract it.

So, let’s see an example. 

Extract information with generic invoice model

In the first step, use the invoice extraction model to retrieve the information it was trained on. Simply upload the files and retrieve the extractions. Or use our fine-tuning functionality to create an optimized invoice extraction model (check out our blog post for detailed steps).

Identify missing information

As you can see from the screenshot, the previous model detected much information. However, there are two entities in our data that are not detected by this model which are “Vendor Contact” and “Communication ID”. 

Train a custom extraction model

Train a model that would detect these two entities when trained on your data.
The model is created in the four following steps:
  1. Specify the entities to extract.
  2. Upload your data.
  3. Annotate these two entities in your data.
  4. Train the custom model.
A detailed blog post on how to create a custom extraction API can be found here.

Retrieve the missing information

Use the trained custom model to extract the remaining information.
At the end, your workflow should look like the following:
An example of using the output of the two models looks like this: 
That’s it! This is how you extend our invoice extraction model to span any additional information that the model was not trained on.