Problems exctracting text with PoDoFo

Asked Mar 14 '22 at 17:03

Active Mar 14 '22 at 17:03

Viewed 103 times

I am new to PoDoFo and trying to extract the text content of a PDF file with it. I have essentially followed this and it works fine with some PDFs but not with others. That is, std::cout << str << "" prints nothing. I have tried to change if(a[i].IsString()) to if(a[i].IsString() || a[i].IsHexString()) but this doesn't help. Any ideas at all about what could cause this? I'm out of ideas and a bit desperate. I know the text can be extracted with PDFminer.six in Python (and the files are just containing black on white text - nothing fancier) but I'd like to extract the text with with C++. Thanks in advance.

asked Mar 14 '22 at 17:03

Ontuvainen

1

Please edit your post with more context: [mcve]. – Thomas Matthews Mar 14 '22 at 17:07
Recommendation: Take the [tour] and read [ask] and [Writing the Perfect Question](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/). – user4581301 Mar 14 '22 at 17:10
I don't have any trouble extracting text from the example.pdf with PoDoFo. I can access the first line starting "This printout..." and also text in Table1 and the footnotes and the references. So no problem there. I have PDFs that are far less complex looking and I can't extract text from them! Any ideas what could be behind this? – Ontuvainen Mar 15 '22 at 11:36

Problems exctracting text with PoDoFo

0 Answers0