2

I have some trouble to get this code working. The goal is to merge pdf with a loaded pdf in a PDDocument object. I don't want to use the mergeUtility of PdfBox because it implies to closed the PDDocument object. I have a lot of data to process and I use a loop to process it. Load and close a PDDocument will take too much time and resource (maybe I'm wrong but that the way it feel it).

Here is my way to do it :

for (String path:pathList) {
    /* ... */
    if(path.endsWith("pdf")){
        File pdfToMerge = new File(path);
        try(PDDocument pdfToMergeDocument = PDDocument.load(pdfToMerge)){
            for (int pageIndex = 0; pageIndex < pdfToMergeDocument.getNumberOfPages(); pageIndex++){
                PDPage page = pdfToMergeDocument.getPage(pageIndex);
                doc.addPage(page);
            }
        }catch (IOException e){
            System.out.println("Pdf : " + path + ANSI_RED + "  [FAILED]" + ANSI_RESET);
            continue;
        }finally {
            System.out.println("Pdf : " + path + ANSI_GREEN +"  [OK]" + ANSI_RESET);
        }
    }
}
doc.save("src/Kairos/OutPut/"+pdfName[pdfName.length - 1]+".pdf");
doc.close();

The error happen when I try to save the document, on line 65.

I get this error message :

Exception in thread "main" java.io.IOException: COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?
at org.apache.pdfbox.cos.COSStream.checkClosed(COSStream.java:83)
at org.apache.pdfbox.cos.COSStream.createRawInputStream(COSStream.java:133)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromStream(COSWriter.java:1214)
at org.apache.pdfbox.cos.COSStream.accept(COSStream.java:402)
at org.apache.pdfbox.cos.COSObject.accept(COSObject.java:158)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObject(COSWriter.java:521)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteObjects(COSWriter.java:459)
at org.apache.pdfbox.pdfwriter.COSWriter.doWriteBody(COSWriter.java:443)
at org.apache.pdfbox.pdfwriter.COSWriter.visitFromDocument(COSWriter.java:1108)
at org.apache.pdfbox.cos.COSDocument.accept(COSDocument.java:449)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1381)
at org.apache.pdfbox.pdfwriter.COSWriter.write(COSWriter.java:1268)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1334)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1305)
at org.apache.pdfbox.pdmodel.PDDocument.save(PDDocument.java:1293)
at Kairos.Main.main(Main.java:65)
Hugo Chittaro
  • 111
  • 10
  • @FedericoklezCulloca `doc` is declared at the begin of the file, it's created with the CreatePDFA class of apache's example. I check if the code work without this part and all is fine I get no error and I can save the document. The problem is really in this block. If you want I can edit my post to add the full code. – Hugo Chittaro Jul 30 '19 at 08:55
  • To summarize the answer: close `pdfToMergeDocument` only after saving `doc`. – Tilman Hausherr Jul 30 '19 at 09:18
  • @TilmanHausherr as per my other comment, no. It's the destination file that's being closed too soon. This has nothing to do with the source files. – Federico klez Culloca Jul 30 '19 at 09:19
  • The destination document `doc` is saved before it gets closed. `pdfToMergeDocument` is getting closed much earlier due to the try-with-resources syntax. And the exception happens when saving. – Tilman Hausherr Jul 30 '19 at 09:41
  • 1
    @TilmanHausherr please re-read my answer below. There's a loop in there, which closes `doc` at the end of the first iteration. On the second iteration `save` fails because `doc` is closed. – Federico klez Culloca Jul 30 '19 at 09:53
  • 1
    Oops yes, indeed. (But my comment may still apply) – Tilman Hausherr Jul 30 '19 at 10:54
  • 2
    @TilmanHausherr indeed. I just re-read the documentation for the `PDDocument::addPage` method. It doesn't make it clear, but it does not make a copy. I'll amend my answer with a solution to this later. Thanks for your comments – Federico klez Culloca Jul 30 '19 at 11:00
  • @TilmanHausherr My bad, I'm ashamed I didn't write the code well. Please forgive me. I just edit the code the way it is in my file. Again sorry – Hugo Chittaro Jul 31 '19 at 10:56
  • Please don't apologize here, everybody makes mistakes, and it is one of the purposes of this site to find them. – Tilman Hausherr Jul 31 '19 at 11:04
  • @TilmanHausherr just tried the solution that was pointed by you in the answer, but it turn out that it didn't work. I'll try to find an another method to merge pdf like I did, maybe by using stream – Hugo Chittaro Jul 31 '19 at 11:10

1 Answers1

1

Consider this: you have a list of Strings in pathList and you iterate through it.

At the end of the first loop you save doc and you close it.

Then you loop again and try to save doc. Which you closed in the previous iteration.

If your objective is to put the contents of all the pdfs in pathList inside the pdf pointed to by doc, you have to close it outside the loop, after you looped over all of pathList.

EDIT:

As pointed out by Tilman Hausherr, there's another problem. When you call addPage you're not making a copy of the original page, you're more or less linking to it. Since you're using a try-with-resources construct, the original file gets closed at the end of the try-catch construct, meaning that, as soon as you exit the construct, you lose any reference to the original page. So you have to save before exiting the try-catch or you use importPage instead, which makes a copy (and will then call addPage anyway). So

PDPage page = pdfToMergeDocument.getPage(pageIndex);
doc.importPage(page);

EDIT 2:

Of course this answer is now wrong because OP posted the wrong code in the original question :) I'll leave this here in case anyone needs it.

Federico klez Culloca
  • 26,308
  • 17
  • 56
  • 95
  • Indeed: You should not close the source documents before having saved the destination document. Because the destination document still accesses resources from the source documents. – Tilman Hausherr Jul 30 '19 at 09:16
  • I'm sorry but the answer with the loop part isn't it just because I'm dumb. I edit the code the way it's in my file. But your edit to your answer maybe be the solution. I'll try it – Hugo Chittaro Jul 31 '19 at 10:58
  • It turn out that the solution of `doc.importPage(page);` doesn't work, I get the error of my post. – Hugo Chittaro Jul 31 '19 at 11:08
  • Then edit your question to add the new code (don't remove the old one). – Tilman Hausherr Jul 31 '19 at 11:10