At natif.ai, we have a great
invoice extraction model that you can use to extract the relevant information you might need for your downstream analysis. We also enable you to
fine-tune this generic invoice extraction model on your own data to squeeze out even more performance. We trained the model to extract as much information as possible, however, because perfection is not attainable, we are aware that your data might have information that our model was not trained to extract!
We understand how important it is to extract
all the information you need from your documents. Therefore, an incoming solution would be to enable you to further train the model to extract your specific information.
However, until this is ready, we want to show you a temporary solution for how to use our APIs to extract all the information you need.
To do so, we will be using the two APIs,
invoice extraction model and
train your custom extraction model. The idea is to get the information that can be extracted from the excellent invoice extraction model, and for whatever information is not extracted, we train a custom extraction model to extract it.
So, let’s see an example.
In the first step, use the invoice extraction model to retrieve the information it was trained on. Simply upload the files and retrieve the extractions. Or use our
fine-tuning functionality to create an optimized invoice extraction model (check out our
blog post for detailed steps).
As you can see from the screenshot, the previous model detected much information. However, there are two entities in our data that are not detected by this model which are “Vendor Contact” and “Communication ID”.
Train a model that would detect these two entities when trained on your data.
A detailed blog post on how to create a custom extraction API can be found
here.
Use the trained custom model to extract the remaining information.
That’s it! This is how you extend our invoice extraction model to span any additional information that the model was not trained on.