I'm currently working on a python project where I need to extract some information from pdf document. The list of information to extract is same for all the documents. The pdf are structured documents from various languages and could be assimilated to forms document.
I'm wondering if they is any machine-learning model, or way to allow me to solve this task. :)
Samples of different pdf document Sample 1 Sample 2
I would like to extract both the currency and the initial issue date, so it would give as an input : (EUR,30 Janvier 2013) for the first sample and (EUR,29 January 2009) for the second.
Maxime