Pdfbox - Issue with pdf generation 3.0.0-RC1 vs 2.0.24

Question

Tried to generate a simple pdf with pdfbox 3.0.0-RC1 and it generates corrupt pdf. On further inspection the pdf generated is missing xref. Tried with pdfbox 2.0.24 and the pdf is generated successfully. The xref is there and I can open the PDF. Anyone know if this is a issue with pdfbox 3.0.0-RC1?

Edit:

ByteArrayOutputStream output = new ByteArrayOutputStream();
try (PDDocument document = new PDDocument()) {
    PDPage page = new PDPage();
    document.addPage(page);

    try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
        contentStream.beginText();
        contentStream.setFont(PDType1Font.TIMES_ROMAN, 12);
        contentStream.setLeading(14.5f);
        contentStream.newLineAtOffset(25, 700);
        contentStream.showText("Hello World");
        contentStream.endText();
        // Make sure that the content stream is closed:
        contentStream.close();
    }

    document.save(output);
} catch (Exception e) {
    e.printStackTrace();
}

Edit2:

%PDF-1.6
%öäüß
1 0 obj
<<
/Type /Catalog
/Version /1.4
/Pages 2 0 R
>>
endobj
7 0 obj
<<
/Length 50
/Filter /FlateDecode
>>
stream
xs
áÒw3T04RIã240P0â.
Ô|ðü¢M,.×.
Ð
endstream
endobj
8 0 obj
<<
/Length 189
/Type /ObjStm
/N 5
/Filter /FlateDecode
/First 27
>>
stream
xUÍ
Â0_e^@·ikKA
endstream
endobj
9 0 obj
<<
/Length 33
/Root 1 0 R
/ID [<B8D11B08CDC0D3C46CF107ADC4249370> <B8D11B08CDC0D3C46CF107ADC4249370>]
/Type /XRef
/Size 10
/Index [0 9]
/W [1 1 1]
/Filter /FlateDecode
>>
stream
xc`øÏÈÏÀÄDLLLÌL,~'
endstream
endobj
startxref
493
%%EOF

Edit3: Link to pdf https://easyupload.io/mz4fsk

Welcome to StackOverflow. Your question isn't well suited for this site as SO is not a discussion forum. It probably makes more sense to ask it in the PDFBox mailing list or open a JIRA issue with them. — Codo, Aug 24 '21 at 09:21
Maybe. Please retry with a snapshot https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/3.0.0-SNAPSHOT/ If it still happens, please open a ticket in JIRA and attach your code and the result PDF. — Tilman Hausherr, Aug 24 '21 at 09:28
*"the pdf generated is missing xref"* - how did you determine that? PDFBox indeed by default now uses xref streams instead of xref tables. Could it be that you were looking only for xref tables? Other than that indeed please provide enough code to reproduce your issue and share an example corrupt PDF output. — mkl, Aug 24 '21 at 11:18
You might be right. I was just looking for xref tables, not streams. But has there been any changes for how you create a PDF? Since its working in 2.0.24. — oveb, Aug 24 '21 at 11:54
Please share the result PDF. Your code looks good. (You're closing the content stream twice but that shouldn't be a problem) — Tilman Hausherr, Aug 24 '21 at 12:59
I tested your code, writing the result to a file using `Files.write(new File(RESULT_FOLDER, "HelloWorld.pdf").toPath(), output.toByteArray())`, the `RESULT_FOLDER` denoting just that, an existing folder. The output is a valid PDF file. In which way is the output you got corrupt? — mkl, Aug 24 '21 at 14:29
Concerning your second edit: Due to the use of compressed object streams and xref streams pasting a PDF as text is not helpful. In principle the PDF looks ok, though. Please explain what you mean by corruption. — mkl, Aug 24 '21 at 16:00
One issue, though: The `/Version /1.4` makes no sense. Created [PDFBOX-5265](https://issues.apache.org/jira/browse/PDFBOX-5265). — mkl, Aug 24 '21 at 16:23
When I try to open the PDF file I get a error message "Error Failed to load PDF document." I have tried opening it in Chrome, Edge, Adobe. Can I add the pdf file as an attachment here? — oveb, Aug 24 '21 at 17:02
Upload to a sharehoster. But first save directly to a file. Don't copy from a screen. — Tilman Hausherr, Aug 24 '21 at 18:17
The file you shared looks like you did not share it as it came out of the `ByteArrayOutputStream` (compare my `Files.write` instruction above) but instead as if you had made it a `String` (or character stream) assuming Latin1 encoding and then made it a byte array or file again using UTF-8, damaging the file by that procedure. Please try and store the PDF using my `Files.write` instruction above right after `document.save(output)` and check that file. And please explain, what exactly do you do with the contents from the `ByteArrayOutputStream` until you store the result file? — mkl, Aug 25 '21 at 13:04
Yes writing the pdf directly to a file works. And the file in edit3 has been encoded to string, preparing it for download. String file = Base64.getEncoder().encodeToString(IOUtils.toByteArray(input)); So it seems like it something wrong with the encoding. — oveb, Aug 25 '21 at 16:31
Decoding the base64 encoded file string happens with window.atob(file); The atob fuction does not properly decode the Unicode characters. — oveb, Aug 26 '21 at 12:34
"A String, representing the decoded string". A PDF file isn't a string. It is binary data. (But I'm not a JS developer) How about this answer? https://stackoverflow.com/questions/21797299/ — Tilman Hausherr, Aug 26 '21 at 18:13

Pdfbox - Issue with pdf generation 3.0.0-RC1 vs 2.0.24

0 Answers0