At natif.ai we work hard to provide an amazing generic invoice extraction model. The goal of this model is to be as good as possible on every invoice you can imagine, which is obviously a very hard goal to achieve.
However, we are aware that in your everyday life you are unlikely to receive every invoice, but are more likely to receive invoices from certain issuers over and over again, and from others less often. So amazing-generic may not be your ultimate goal, but rather perfection-for-your-issuers!
Many of our competitors try to achieve perfection by manually defining tailor-made rules to extract information from specific invoice templates. But what do you do when you occasionally receive invoices that don’t follow these templates? Should you create a new set of rules for every other invoice? Should you simply process such invoices manually?
Well, with natif.ai you don’t have to. Our amazing AI model can now be trained to be your perfect AI model! All it needs is a few minutes of your time and it will learn how to process your own invoices with impeccable precision.
In this post, we will walk you through the steps of our fine-tuned invoice extraction feature.
Let’s Start
We start in the
Workflow Overview of our platform and choose “Train Your Own Model Now”.
Select Your Workflow
Here you can find all our Custom AI Workflows. For our Fine-Tuned Invoice Workflow, we select “Create Fine-Tuned Invoice Extraction“.
Describe Your Fine-Tuned Extraction Workflow
We start with describing our workflow by giving it a name and short description. You can also upload an image. This will help you to distinguish this workflow from the others.
Specify Your Documents
Now we have to give the AI some information about our documents so it knows which tasks need to be done. This also improves the accuracy of your workflow.
For a Fine-Tuned Extraction Workflow the AI needs to know:
– Are the documents always perfectly cropped or should they be cropped in the workflow?
– Is the text printed or handwritten? Or can it be both?
Your Workflow Is Created
Your model is ready but is initially just a simple generic one. It still needs your guidance to excel!
Right now, you can test the model on your own data, however, the results might not be as satisfactory because the model will behave based on its generic knowledge. If your data poses unique challenges, you will need to give it some training first. For this we select “Upload Training Data“.
Upload Your Training Data
You are now ready to upload your own data for annotation. For the highest accuracy, we recommend that you upload your data issuer by issuer (templates), so that you can annotate all samples of a particular template, one after the other, for consistency.
Also, uploading issuer by issuer will allow you to later obtain issuer specific metrics so you can investigate what works well and what does not, which in turn will help you to further optimise your invoice model.
Of course, you don’t have to use this functionality, you can simply create an ‘unknown’ issuer and upload files from different issuers to it.
Please upload a minimum of 5 documents per template. It’s very important to select documents that are very similar to the documents that the model will process later.
This will help the AI get a full understanding of your documents and provide high accuracy processing.
Annotate Your Training Documents
Now we have to annotate our uploaded documents. That means we have to teach the AI where to find which data field in our documents.
On the left you can see all the predefined data fields of our basic invoice extraction model!
The colours show you which text fields in the document match which data field. They are also highlighted when you hover over the data field.
If you want to remove a textbox, you can select “None” and then click on the textbox. If you want to add one, you can first select the appropriate data field and then click the text box.
To make your life easier, you can select “Show all colors” and enable the confidence level view. If some text fields appear in yellow or red, it means that the AI is not sure that they match the correct data field.
After checking the data fields you should also check the groupings. These are categorized data fields such as line items or tax details. All grouped text boxes are highlighted with the same color.
Now repeat this step for each of your uploaded documents.
Start The AI Training
Once you are done with annotating the documents, you can start the training.
This means the AI now learns how to process your documents.
You will receive an email once the training is completed – which is normally within the next 24 hours!
Integrate Your API
However, your workflow API is already ready and can be integrated! You can find all information such as code snippets and JSON response examples in the workflow documentation.
That’s It!
Your API will automatically be adjusted when the training is completed! The training metrics will provide you with more information about the accuracy of your AI Workflow.
If you
need support with training your AI model, just contact us and let us how we can help you!