iTextSharp v5 GetTextFromPage() throws IndexOutOfRangeException

Question

Trying to extract the textual content of a pdf with the following code:

PdfReader reader = new PdfReader(path);
string strText = string.Empty;

for (int page = 1; page <= reader.NumberOfPages; page++)
{
    string s = PdfTextExtractor.GetTextFromPage(reader, page);
    strText += " " + s;                
}
reader.Close();

NumberOfPages returns 257, but at page 227, GetTextFromPage() throws a IndexOutOfRangeException.

Any help is appreciated.

hofnarwillie

ItextSharp has very little error handling in it, it could easily be nothing to do with the getting a none existent page and something buried deep inside the code i have had wierd errors like this many times and the problem was unrelated to the error message. Get the source and step into it. — Ben Robinson, Dec 20 '11 at 16:43
I've heard other reports of IndexOutOfRangeExceptions getting thrown by GetTextFromPage when pages exist, but I haven't seen any solutions. Ben's recommendation to step into the code is probably your best bet. — jball, Dec 20 '11 at 16:47
Thanks for the comments. I have already given a half-hearted attempt at that and could not find anything specific. Thought that there might be someone else who has already spent the countless hours stepping through unknown code that could give me a quick solution. In my particular case, I doubt that the benefit gained in finding a solution for this outweighs the time spent in solving it the traditional way. Thanks anyway. — hofnarwillie, Dec 20 '11 at 16:56
If you know for sure page 227 is the problem; [1] extract the page into a stand-alone PDF, [2] write a short stand-alone program that demonstrates the error, and [3] submit a bug report - http://sourceforge.net/tracker/?group_id=72954&atid=536236 — kuujinbo, Dec 20 '11 at 18:32
Along with what @kuujinbo said, if you can post a link to the PDF here I'd debug through it, too. — Chris Haas, Dec 20 '11 at 20:07

score 1 · Accepted Answer · answered Apr 19 '12 at 18:02

1

I resolved this issue by updating my version of iTextSharp from 5.1 to 5.2.

answered Apr 19 '12 at 18:02

Scott Scowden

1,155
2
11
19

Same problem, upgraded from 5.1 to 5.4 and it was fixed - thanks, I was about to spend hours stepping through the code! – bigtv Aug 14 '13 at 17:44

iTextSharp v5 GetTextFromPage() throws IndexOutOfRangeException

1 Answers1

Linked