Debugging PDF for error

Question

I'm creating PDF files using PDFClown java library.

Sometimes, when openning these files with Adobe Acrobat Reader I get the famous error message:

"An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."

The error shows while reading (with Adobe) the attached file only when scrolling down to the 8'th page, then scrolling back up to 3'td page. Alternatively, Zooming out to 33.3% will also produce the message.

Just for the record, Foxit reader reads the file flawlessly, as well as other PDF readers like browsers.

My questions are:

What's wrong with my file?? (file is attached)
How can I find what's wrong with it? is there a tool which tells you where does the error lie?

Thanks!

Adobe Acrobat has some profiling profiles that can help there. — Martin Schröder, Sep 15 '13 at 21:08
I tried checking it with preflight, and for each check it gave me "An error occurred while parsing a content stream. Unable to analyze the PDF file.". Please help... — user1028741, Sep 16 '13 at 13:18
Same problem here and Preflight fails in my case too... :( So, I guess there is no tool that really tells you where the error is... Well done Adobe. Useless as always... — user2173353, Mar 28 '16 at 14:18

user1028741 · Accepted Answer · 2014-09-28T18:52:34.927

5

Ok, this wasn't easy -

Due to a bug in PDFClown the my main stream of information in the PDF page has been corrupted. After it's end it had a copy of a past instance of it. This caused a partial text section without the starting command "BT" - which left a single "ET" without a "BT" in the end of the stream.

once I corrected this, it ran great.

Thank you all for your help. I would have much more difficult time debugging it without the tool RUPS which @Bruno suggested.

edit:

The bug was in the Buffer.java:clone() (line 217)

instead of line:

clone.append(data);

needs to be:

clone.append(data, 0, this.length);

Without this correction it clones the whole data buffer, and set the cloned Buffer's length to the data[].length. This is very problematic if the Buffer.length is smaller than the data[].length. The result in my case was that in the end of the stream there was garbage.

edited Sep 28 '14 at 18:52

answered Sep 20 '13 at 19:53

user1028741

2,745
6
34
68

(sorry for my late comment, I'm PDF Clown's author) It'd be helpful if you indicated the actual code which caused your issue to happen, so that a constraint may be possibly imposed to avoid it, thanks. – Stefano Chizzolini Sep 27 '14 at 16:21
@StefanoChizzolini, I sent you mail with the solution at the time. Anyway, I edited the answer so it will include the fix. – user1028741 Sep 28 '14 at 18:53
You are absolutely right, it was my fault! I have just retrieved your mail dated Fri, September 20, 2013 10:14 pm: during that period I was taking a hiatus from the project, so I overlooked it, I'm really sorry. Nonetheless, it's always a good thing to post your solution in the first place, as it may benefit any other user. I'm going to include it in the next release of PDF Clown (0.2.0). thank you very much! – Stefano Chizzolini Sep 28 '14 at 19:25

score 4 · Answer 2 · edited Jun 20 '20 at 09:12

4

The error shows while reading (with Adobe) the attached file only when scrolling down to the 8'th page, then scrolling back up to 3'td page. Alternatively, Zooming out to 33.3% will also produce the message.

Well, I get it easier, I merely open the PDF and scroll down using the cursor keys. As soon as the top 2 cm of page 3 appear, the message appears.

What's wrong with my file??

The content of pages 1 and 2 look ok, so let's look at the content of page 3.

My initial attributing the issue to the use of text specific operations (especially Tf and Tw) outside of a text object was wrong as Stefano Chizzolini pointed out: Some text related operations indeed are allowed outside text objects, namely the text state operations, cf. figure 9 from the PDF specification:

Graphics Objects

So while being less common, text state operations at page description level are completely ok.

After my incorrect attempt to explain the issue, the OP's own answer indicated that the

main stream of information in the PDF page has been corrupted. After it's end it had a copy of a past instance of it. This caused a partial text section without the starting command "BT" - which left a single "ET" without a "BT" in the end of the stream.

An ET without a prior BT indeed would be an error, and quite likely it would be accompanied by operations at the wrong level... Inspecting the stream content of that third page (the focused page of this issue), though, I could not find any unmatched ET. In the course of that inspection, though, I discovered that the content stream contains more than 2000 trailing 0 bytes! Adobe Reader seems not to be able to cope with these 0 bytes.

The bug the OP found, can explain the issue:

in the Buffer.java:clone() (line 217)

instead of line:
clone.append(data);
needs to be:
clone.append(data, 0, this.length);
Without this correction it clones the whole data buffer, and set the cloned Buffer's length to the data[].length. This is very problematic if the Buffer.length`` is smaller than the data[].length.

Trailing 0 bytes can be an effect of such a buffer copying bug.

Furthermore symptoms as found by the OP (After it's end it had a copy of a past instance of it) can also be the effect of such a bug. So I assume the OP found those symptoms on a different page, not page 3, but fixing the bug healed all symptoms.

How can I find what's wrong with it? is there a tool which tells you where does the error lie?

There are PDF syntax checkers, e.g. the Preflight tool included in Adobe Acrobat. but even that fails on your file.

So essentially you have to extract the page content (using a PDF browser, e.g. RUPS) and check manually with the PDF specification on the other screen.

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 16 '13 at 14:21

mkl

90,588
15
125
265

@Bruno, I thank you so much for your efforts to help me!! I'm trying to study this a bit to understand everything yo said :). When I'll understand what was the cause of the problem I'll post.. – user1028741 Sep 17 '13 at 12:27
It was mkl who helped you. I upvoted his answer and added a link to a short blog post about RUPS. I don't (want to) know PdfClown, but what mkl is telling you, is that you're creating PDF syntax that is illegal according to ISO-32000-1. You (or PdfClown) are mixing text operators and graphics operators, breaking the rules of the specification, resulting in a PDF that is so broken that even Acrobat can't fix it. – Bruno Lowagie Sep 17 '13 at 13:32
I solved the problem (if interested - see answer). The problem wasn't those Tf's outside a text-block. Those Tf's actually set the default fonts' details (I'm not saying it's allowed by the PDF specification - but that's what it actually does...). I thank you all once again. – user1028741 Sep 20 '13 at 19:57
*I'm not saying it's allowed by the PDF specification - but that's what it actually does...* - it may do that in current versions of some pdf viewers but counting on that behavior to be still there in the next version is somewhat risky. – mkl Sep 20 '13 at 23:17
Please @BrunoLowagie and mkl, are you sure that those text operators are illegal?? According to PDF 1.7 (and its derivative ISO-32000-1) **text state operators** (like above-mentioned Tf and Tw) **are pretty legal at page description level (that is OUTSIDE text objects)**!! I don't (want to) know who is Bruno Lowagie, but I would have expected he had endorsed a correct answer, avoiding lazy assumptions that throw discredit over the quality of others' projects. thank you! – Stefano Chizzolini Sep 27 '14 at 16:11
Don't worry @user1028741, text state operators like Tf, despite what Bruno Lowagie said, are absolutely LEGAL outside text objects. ;-) – Stefano Chizzolini Sep 27 '14 at 16:31
@Stefano yes, you are right. I merely often saw text positioning or text drawing operations used outside text objects and path construction operations etc used inside them and do assumed such an issue to early. I found out myself sometime early this year that text state operations are legal outside text objects, too. – mkl Sep 27 '14 at 19:45
@Stefano Chizzolini: if you are developing PDF software, please join the ISO committee for PDF so that you're up to date. Another reason: Adobe owns most of the patents with respect to PDF, but grants every one who respects the specs a license to use those patents. That also means that whoever doesn't respect the specs may be infringing a patent. Being a member of the ISO committee gives you access to the people who write the specs, so it is in your interest to join. – Bruno Lowagie Sep 27 '14 at 22:46
@Bruno *I am being proactive and referring to ISO-32000-2.* - the latest draft I saw, 2014-0220, still allowed text state operators in text objects. Has it changed that much in between? – mkl Sep 28 '14 at 00:15
@BrunoLowagie *"I am being proactive and referring to ISO-32000-2"*. - No way, sorry: you were *explicitly* referring to ISO-32000-1, NOT ISO-32000-2! Anyway, referring to ISO-32000-2 would be by any means pointless as we were just reasoning about the compatibility against the *current* spec. – Stefano Chizzolini Sep 28 '14 at 09:01
@mkl No problem, I appreciate your honest intention; I would just ask you to amend your original answer removing the passage about the wrong text operators (other users may still interpret it the wrong way). thank you! – Stefano Chizzolini Sep 28 '14 at 09:12
Many things are in flux right now. Edinburgh will bring some important changes (not necessarily regarding **Tf**) which makes that the spec isn't to be expected before 2016. I don't understand the fuss Stefano Chizzolini is making, though. – Bruno Lowagie Sep 28 '14 at 13:29
I've updated mkl's answer. Based on the fact that the OP was able to fix the problem by rearranging text operators, there must have been some significant problem with those operators as indicated by mkl. – Bruno Lowagie Sep 28 '14 at 14:23
@BrunoLowagie I was irritated that your comments lacked technical accuracy (supposed illegality of text state operators outside text objects according to ISO-32000-1 (ISO-32000-2 is OT in this context)), *especially considering that your initial comment expressed sort of disdain about my project*.I am happy to solve possible issues regarding my library, but I expect fairness and respect (constructive comments, NOT destructive ones!). thank you – Stefano Chizzolini Sep 28 '14 at 15:27
I honestly don't know PdfClown and I don't need to know it because I wrote my own PDF library in 2000. iText was the first PDF library that was capable of being used in a web context. Many other developers tried to copy iText's success. Not many developers also wrote a book, vetted the IP of their code, created a business model to ensure sustained support and to make their product future-proof (e.g. by being part of the ISO committees that write the specs). Those are facts, stripped of emotions such as disdain. – Bruno Lowagie Sep 28 '14 at 15:35
@BrunoLowagie It was exactly because of your role in the IT community that I couldn't comprehend the quality of your contribution in this thread. anyway, that's it! – Stefano Chizzolini Sep 28 '14 at 16:02
@StefanoChizzolini *I would just ask you to amend your original answer* - I updated the answer, and the bug the OP found, might indeed be the cause of the more than 2000 zero bytes at the end of the page 3 content stream. – mkl Sep 29 '14 at 09:00
Thank you mkl for your accurate summary, it's a perfect clarification. – Stefano Chizzolini Sep 29 '14 at 10:43

score 1 · Answer 3 · edited May 23 '17 at 12:07

1

the general post about debugging pdf might have been also helpful as rups / pdfstreamdump etc is mentioned there How do you debug PDF files?

edited May 23 '17 at 12:07

Community

1
1

answered Sep 03 '14 at 13:39

ebricca

386
3
5

Debugging PDF for error

3 Answers3

Linked