I'm trying to extract text from pdf files (similar to a form). Currently, I open the file on Chrome, select/copy all the text, paste it into a txt file and process it into CSV using Python. Chrome allows me to have data quite structured and uniform, so that every page of the pdf results in a similar block of text, allowing me to process it easily.
I'm trying to extract the text directly from the pdf, to process it into CSV format, but I always get some messy results, due to the way the original pdf is generated. I've tried pdfminer and pyPdf2, but the results get messy when the form has a missing value in certain fields.
Maybe it's a generalistic question, but, how can I have a more structured result in my extraction?