I am new to PoDoFo and trying to extract the text content of a PDF file with it. I have essentially followed this and it works fine with some PDFs but not with others. That is, std::cout << str << "" prints nothing. I have tried to change if(a[i].IsString()) to if(a[i].IsString() || a[i].IsHexString()) but this doesn't help. Any ideas at all about what could cause this? I'm out of ideas and a bit desperate. I know the text can be extracted with PDFminer.six in Python (and the files are just containing black on white text - nothing fancier) but I'd like to extract the text with with C++. Thanks in advance.
Asked
Active
Viewed 103 times
0
-
1Please edit your post with more context: [mcve]. – Thomas Matthews Mar 14 '22 at 17:07
-
Recommendation: Take the [tour] and read [ask] and [Writing the Perfect Question](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/). – user4581301 Mar 14 '22 at 17:10
-
I don't have any trouble extracting text from the example.pdf with PoDoFo. I can access the first line starting "This printout..." and also text in Table1 and the footnotes and the references. So no problem there. I have PDFs that are far less complex looking and I can't extract text from them! Any ideas what could be behind this? – Ontuvainen Mar 15 '22 at 11:36