I have a PDF file which contains 2 pages. When I parse it with my parser, in Ojective-C, I have the following situation.
For the first page everything is Ok, I have text that I should have (that I visually see in pdf readers like Preview, Adobe reader ...). For the second page I have the text that I see in the second page PLUS a part of the text from the first page, that is not in the second page.
I tried with others parsers : pdftotext (xpdf) they managed to have the correct result. Pdfminer (in python) https://pypi.python.org/pypi/pdfminer/, I got the same result as I had. A part of thext from the first page is extracted twice.
My question is : How can this happen ? Have you ever seen this situation ? If the text is really present in the second page, why don't pdf readers show it ? Do you have any thoughts about this ?