6

I wanted to try to extract highlighted text from a pdf, so I started looking at pdfminer but could not find any documentation for this specific function.

Is this possible at all?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
magicrebirth
  • 4,104
  • 2
  • 25
  • 22

1 Answers1

1

Not sure but please have a look at the script mentioned in this code listing.

Edit: I had to edit my answer because some wiser guys negatively graded my answer, I am trying to provide a solution to a problem that has had no answer for more than a year.

Rafael
  • 7,002
  • 5
  • 43
  • 52
MiniMe
  • 1,057
  • 4
  • 22
  • 47
  • Thanks, I tried by could not get hold of pypoppler (see the error here https://gist.github.com/lambdamusic/b685596d492335838098) – magicrebirth Nov 20 '15 at 14:18
  • I am actively looking for a solution for this issue Poppler can be found here:https://pypi.python.org/pypi/python-poppler-qt4/ I can not find proper documentation. They send the reader to the C+ docs which are chinese to me :-) Currently looking for other Python solutions – MiniMe Nov 20 '15 at 14:57
  • Some clues here: http://poppler.freedesktop.narkive.com/7KbUnSay/python-pdf-hightlighted-text-not-annotation-popup-how-to-extract-it-text and an example here: http://stackoverflow.com/questions/21050551/extracting-text-from-higlighted-text-using-poppler-qt4-python-poppler-qt4 – MiniMe Nov 20 '15 at 15:07
  • I got this to work, but unfortunately it does not extract highlights made with OSx Preview – magicrebirth Nov 20 '15 at 17:08
  • Was it difficult to install Qt4 and poppler? I am at work and I could not test any of the above – MiniMe Nov 20 '15 at 17:59