Are there any libraries for Python that allow extraction of text from PDFs, but preserve formatting (i.e. bold, italics, underline, color, etc)?
I've looked into options such as pdfminer
but to the best of my knowledge they only extract raw text.
Are there any libraries for Python that allow extraction of text from PDFs, but preserve formatting (i.e. bold, italics, underline, color, etc)?
I've looked into options such as pdfminer
but to the best of my knowledge they only extract raw text.