Python - Extract formatted text (i.e. bold, italics, color) from pdf

Asked Oct 31 '15 at 00:40

Active Oct 31 '15 at 00:45

Viewed 4,596 times

Are there any libraries for Python that allow extraction of text from PDFs, but preserve formatting (i.e. bold, italics, underline, color, etc)?

I've looked into options such as pdfminer but to the best of my knowledge they only extract raw text.

edited Oct 31 '15 at 00:45

Remi Guan

asked Oct 31 '15 at 00:40

adeora

"The pdfminer documentation says it's possible" - http://stackoverflow.com/q/22329508/2564301 – Jongware Oct 31 '15 at 12:16
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. – David van Driessche Oct 31 '15 at 16:50

0 Answers0