I am currently working on a digitalisation project which consists in extracting specific information from pdf-formatted electricity invoices. Once the data is extracted, I would like to store it in an Excel spreadsheet.
The objectives are the following:
First of all, the data to be extracted would be the following:
https://i.stack.imgur.com/6RLo2.png
In this case, the data to be extracted is the information surrounded in red. This would be the CUPS, the total amount and the consumed electricity per period (P1-P6).
Once this is extracted, I would like to display this in an Excel Spreadsheet.
Could you please give me any ideas/tips regarding the extraction of this data? I understand that OCR software would do this best, but do not know how I could extract this specific information.
Thanks for you help and advice.