3

I'm using Pdfbox (1.8.8) to adding attachments to a pdf. My problem is when one of the attachments is of type .pdf and i'm saving the PDDocument to OutputStream the final pdf document does not include the attachments. If a save the PDDocument to a file instead an OutputStream all works just fine, and if the attachments does not include any pdf, both save to file or OutputStream works fine.

I would like to know if there is any way to add pdf embedded Files and save the PDDocument to OutputStream keeping the attached files in the final pdf that is generated.

The code i'm using is:

 private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {

            final PDDocument doc;
            Boolean hasPdfAttach = false;
            try {
                doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
                // final PDFTextStripper pdfStripper = new PDFTextStripper();
                // final String text = pdfStripper.getText(doc);
                final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
                final Map embeddedFileMap = new HashMap();
                PDEmbeddedFile embeddedFile;
                File file = null;

                for (Attachment attach : attachmentsResources) {

                    // first create the file specification, which holds the embedded file
                    final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
                    fileSpecification.setFile(attach.getFilename());
                    file = AttachmentUtils.getAttachmentFile(attach);
                    final InputStream is = new FileInputStream(file.getAbsolutePath());

                    embeddedFile = new PDEmbeddedFile(doc, is);
                    // set some of the attributes of the embedded file
                    if ("application/pdf".equals(attach.getMimetype())) {
                        hasPdfAttach = true;
                    }
                    embeddedFile.setSubtype(attach.getMimetype());
                    embeddedFile.setSize((int) (long) attach.getFilesize());
                    fileSpecification.setEmbeddedFile(embeddedFile);

                    // now add the entry to the embedded file tree and set in the document.
                    embeddedFileMap.put(attach.getFilename(), fileSpecification);
                    // final String text2 = pdfStripper.getText(doc);
                }
                // final String text3 = pdfStripper.getText(doc);
                efTree.setNames(embeddedFileMap);
                // ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS); (this not work for me)
                // attachments are stored as part of the "names" dictionary in the document catalog
                final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
                names.setEmbeddedFiles(efTree);
                doc.getDocumentCatalog().setNames(names);
                // final ByteArrayOutputStream pdfboxToDocumentStream = new ByteArrayOutputStream();
                final String tmpfile = "temporary.pdf";
                if (hasPdfAttach) {
                    final File f = new File(tmpfile);
                    doc.save(f);
                    doc.close();
                     //i have try with parser but without success too
                    // PDFParser parser = new PDFParser(new FileInputStream(tmpfile));
                    // parser.parse();
                    // PDDocument doc2 = parser.getPDDocument();
                    final PDDocument doc2 = PDDocument.loadNonSeq(f, new RandomAccessFile(new File(getHomeTMP()
                            + "tempppp.pdf"), "r"));
                    doc2.save(out);
                    doc2.close();
                } else {
                    doc.save(out);
                    doc.close();
                }
                 //that does not work too
                // final InputStream in = new FileInputStream(tmpfile);
                // IOUtils.copy(in, out);
                // out = new FileOutputStream(tmpFile);
                // doc.save (out);

            } catch (IOException e1) {
                e1.printStackTrace();
            } catch (Exception e2) {
                e2.printStackTrace();
            }
        }

Best regards

Solution:

private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {

    final PDDocument doc;
    try {
        doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
        ((ByteArrayOutputStream) out).reset();
        final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
        final Map embeddedFileMap = new HashMap();
        PDEmbeddedFile embeddedFile;
        File file = null;

        for (Attachment attach : attachmentsResources) {

            // first create the file specification, which holds the embedded file
            final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
            fileSpecification.setFile(attach.getFilename());
            file = AttachmentUtils.getAttachmentFile(attach);
            final InputStream is = new FileInputStream(file.getAbsolutePath());

            embeddedFile = new PDEmbeddedFile(doc, is);
            // set some of the attributes of the embedded file
            embeddedFile.setSubtype(attach.getMimetype());
            embeddedFile.setSize((int) (long) attach.getFilesize());
            fileSpecification.setEmbeddedFile(embeddedFile);

            // now add the entry to the embedded file tree and set in the document.
            embeddedFileMap.put(attach.getFilename(), fileSpecification);

        }
        efTree.setNames(embeddedFileMap);
        ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
        // attachments are stored as part of the "names" dictionary in the document catalog
        final PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
        names.setEmbeddedFiles(efTree);
        doc.getDocumentCatalog().setNames(names);
        ((COSDictionary) efTree.getCOSObject()).removeItem(COSName.LIMITS);
        doc.save(out);
        doc.close();

    } catch (IOException e1) {
        e1.printStackTrace();
    } catch (Exception e2) {
        e2.printStackTrace();
    }
}
Fábio Antunes
  • 55
  • 2
  • 10

1 Answers1

5

You store the new PDF after the original PDF in out:

Look at all the uses of out in your method:

private void insertAttachments(OutputStream out, ArrayList<Attachment> attachmentsResources) {
    ...
            doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));
    ...
                doc2.save(out);
    ...
                doc.save(out);

So you get as input a ByteArrayOutputStream and take its current content as input (i.e. the ByteArrayOutputStream is not empty but already contains a PDF) and after some processing you append the modified PDF to the ByteArrayOutputStream. Depending on the PDF viewer you present this to, you will be shown either the original or the manipulated PDF or a (very correct) error message that the file is garbage.

If you want the ByteArrayOutputStream to contain only the manipulated PDF, simply add

((ByteArrayOutputStream) out).reset();

or (if you are not sure about the state of the stream)

out = new ByteArrayOutputStream();

right after

doc = PDDocument.load(new ByteArrayInputStream(((ByteArrayOutputStream) out).toByteArray()));

PS: According to the comments the OP tried the above proposed changes to his code without success.

I cannot run the code as presented in the question because it is not self-contained. Thus, I reduced it to the essentials to get a self-contained test:

@Test
public void test() throws IOException, COSVisitorException
{
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (
            InputStream sourceStream = getClass().getResourceAsStream("test.pdf");
            InputStream attachStream = getClass().getResourceAsStream("artificial text.pdf"))
    {
        final PDDocument document = PDDocument.load(sourceStream);

        final PDEmbeddedFile embeddedFile = new PDEmbeddedFile(document, attachStream);
        embeddedFile.setSubtype("application/pdf");
        embeddedFile.setSize(10993);

        final PDComplexFileSpecification fileSpecification = new PDComplexFileSpecification();
        fileSpecification.setFile("artificial text.pdf");
        fileSpecification.setEmbeddedFile(embeddedFile);

        final Map<String, PDComplexFileSpecification> embeddedFileMap = new HashMap<String, PDComplexFileSpecification>();
        embeddedFileMap.put("artificial text.pdf", fileSpecification);

        final PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
        efTree.setNames(embeddedFileMap);

        final PDDocumentNameDictionary names = new PDDocumentNameDictionary(document.getDocumentCatalog());
        names.setEmbeddedFiles(efTree);
        document.getDocumentCatalog().setNames(names);

        document.save(baos);
        document.close();
    }
    Files.write(Paths.get("attachment.pdf"), baos.toByteArray());
}

As you see PDFBox here uses only streams. The result:

Adobe Reader screenshot showing "attachment.pdf" with attachment "artificial text.pdf"

Thus, PDFBox without problem stores a PDF into which it has embedded a PDF file attachment.

The problem, therefore, most likely have nothing to do with this work flow as such

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Hello mkl, Thank you, your assumptions are completely right. Well the creation of that "tempppp.pdf" is one of the attemps to make it work properly ( testing ), because like i said in this question, when i save the PDDocument to a file ("tempppp.pdf" in this case) it generates a final pdf document with all the attachments but when i save the PDDocument to OutputStream the final pdf does not contain any attachment. I tryed your suggestion but it did not work, the final pdf generated by saving the document to OutputStream does not contain the attachments.. "tempppp.pdf" contain all the attachs. – Fábio Antunes Jan 16 '15 at 14:35
  • *"tempppp.pdf" contain all the attachs* - that is interesting... because you nowhere save to that file. – mkl Jan 16 '15 at 14:39
  • According to your code the `tmpfile` variable contains **temporary.pdf**, not **tempppp.pdf**. So... you nowhere save to that file **tempppp.pdf**. – mkl Jan 16 '15 at 14:48
  • hi, im sorry my mistake, temporary.pdf contains all the attachments – Fábio Antunes Jan 16 '15 at 14:49
  • Ok, and then you load that other file, **tempppp.pdf** and put it into the `ByteArrayOutputStream`. Thus, whatever file you return, it is obviously not the file you just saved. – mkl Jan 16 '15 at 14:52
  • Hello mkl, Thank you, your assumptions are completely right. Well the creation of that "tempppp.pdf" is one of the attemps to make it work properly ( testing ).Because like i said in this question, when i save the PDDocument to a file ("temporary.pdf") it generates a final pdf document with all the attachments but when i save the PDDocument to OutputStream the final pdf does not contain any attachment. I tryed your suggestion but it did not work, the final pdf generated by saving the document to OutputStream does not contain the attachments.. "temporary.pdf" contain all the attachs. – Fábio Antunes Jan 16 '15 at 14:53
  • But "tempppp.pdf" is just for save the temp PDFBox data, at least in the PDFBox documentation the method loadNonSeq isays: "Parameters: file file to be loaded ,scratchFile location to store temp PDFBox data for this document" am i wrong ? – Fábio Antunes Jan 16 '15 at 15:00
  • Sorry, you're right, I overlooked that `f` and only saw that big `new File(getHomeTMP() + "tempppp.pdf")`. – mkl Jan 16 '15 at 15:05
  • Ok, soo my question is why using doc.save(File f) -> the "temporary.pdf" have all the attachs and if i use doc.save(OutputStream os) (save the document to OutputStream ) the final pdf does not contain the attachs.. Note: Remember that this situation only happens if there is an attach of type .pdf otherwise all works good – Fábio Antunes Jan 16 '15 at 15:17
  • Hi mkl, I remove the File f creation and added your suggestion ( ((ByteArrayOutputStream) out).reset(); ) and it solve my problem :) I will update the question with the solution. Thank you for helping ! – Fábio Antunes Jan 16 '15 at 15:37
  • Ah, ok, great. I just added a self-contained test to my question showing that the work flow as such works alright. – mkl Jan 16 '15 at 15:40