10

Are there any libraries for Python that allow extraction of text from PDFs, but preserve formatting (i.e. bold, italics, underline, color, etc)?

I've looked into options such as pdfminer but to the best of my knowledge they only extract raw text.

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
adeora
  • 557
  • 1
  • 8
  • 21
  • "The pdfminer documentation says it's possible" - http://stackoverflow.com/q/22329508/2564301 – Jongware Oct 31 '15 at 12:16
  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it. – David van Driessche Oct 31 '15 at 16:50

0 Answers0