At natif.ai, we have a great invoice extraction model
that you can use to extract the relevant information you might need for your downstream analysis. We also enable you to fine-tune
this generic invoice extraction model on your own data to squeeze out even more performance. We trained the model to extract as much information as possible, however, because perfection is not attainable, we are aware that your data might have information that our model was not trained to extract!
We understand how important it is to extract all
the information you need from your documents. Therefore, an incoming solution would be to enable you to further train the model to extract your specific information.
However, until this is ready, we want to show you a temporary solution for how to use our APIs to extract all the information you need.
To do so, we will be using the two APIs, invoice extraction model
and train your custom extraction model
. The idea is to get the information that can be extracted from the excellent invoice extraction model, and for whatever information is not extracted, we train a custom extraction model to extract it.
So, let’s see an example.
1. Extract information with generic invoice model
In the first step, use the invoice extraction model to retrieve the information it was trained on. Simply upload the files and retrieve the extractions. Or use our fine-tuning
functionality to create an optimized invoice extraction model (check out our blog post
for detailed steps).
2. Identify missing information
As you can see from the screenshot, the previous model detected much information. However, there are two entities in our data that are not detected by this model which are “Vendor Contact” and “Communication ID”.
3. Train a custom extraction model
Train a model that would detect these two entities when trained on your data.
The model is created in the four following steps:
- Specify the entities to extract.
- Upload your data.
- Annotate these two entities in your data.
- Train the custom model.
A detailed blog post on how to create a custom extraction API can be found here
4. Retrieve the missing information
Use the trained custom model to extract the remaining information.
At the end, your workflow should look like the following:
An example of using the output of the two models looks like this:
That’s it! This is how you extend our invoice extraction model to span any additional information that the model was not trained on.