I would like to extract text from PDF files using PDFminer and Jupyter Notebook.
Here is an example of a PDF file from which I would like to extract text. When I use the code posted here, the output contains only the page one footer, while the rest of the document gets missed.
However, if I first use the Nitro Pro tool's OCR functionality to manually make the PDF file searchable, I am able to subsequently use the above Python code to extract all the text from the file.
I checked the PDFminer documentation to see if there is a parameter that I'm setting incorrectly, but I couldn't find anything on this issue. I would like to convert many files, so converting each file manually, using the Nitro Pro tool, is not feasible.