3

I'm maintaining a web application that uses iText 2.1.7 to create PDFs. I want to take the content of an existing PDF and put it into the pdf document that the code is in the middle of creating. I have the following (EDIT: more complete code):

package itexttest;

import com.lowagie.text.Document;
import com.lowagie.text.PageSize;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfWriter;
import java.io.ByteArrayOutputStream;
import java.io.OutputStream;

public class ITextTest 
{
    public static void main(String[] args) 
    {
        try
        {
            ByteArrayOutputStream os = new ByteArrayOutputStream();
            Document bigDoc = new Document(PageSize.LETTER, 50, 50, 110, 60);
            PdfWriter writer = PdfWriter.getInstance(bigDoc, os);
            bigDoc.open();

            Paragraph par = new Paragraph("one");
            bigDoc.add(par);
            bigDoc.add(new Paragraph("three"));

            addPdfPage(bigDoc, os, "c:/insertable.pdf");

            bigDoc.close();
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

    private static void addPdfPage(Document document, OutputStream outputStream, String location) {
        try {

            PdfReader pdfReader = new PdfReader(location);
            int pages = pdfReader.getNumberOfPages();

            PdfCopy pdfCopy = new PdfCopy(document, outputStream);
            PdfImportedPage page = pdfCopy.getImportedPage(pdfReader, 1);
            pdfCopy.addPage(page);
        }
        catch (Exception e) {
            System.out.println("Cannot add PDF from PSC: <" + location + ">: " + e.getMessage());
            e.printStackTrace();
        }
    }

}

This throws an error, null from PdfWriter.getPageReference().

How am I using this incorrectly? How can I get a page from the existing document and put it in the current document? Notice that I am not in a place where it is at all convenient to write to files as temp storage or whatever.

arcy
  • 12,845
  • 12
  • 58
  • 103
  • *This throws an error, null from `PdfWriter.getPageReference()`* - I don't see you using this method in your code. Where exactly in your code do you get the exception? – mkl Jan 16 '16 at 07:57
  • Sorry -- from pdfCopy.getImportedPage(pdfReader, 1) – arcy Jan 16 '16 at 17:55
  • Ok, basically your code cannot work as you seem to want. You cannot reuse the document and output stream, first for a PdfWriter, then for a PdfCopy. The error itself is a bit surprising but some error is to be expected. What exactly do you hope to achieve? – mkl Jan 16 '16 at 19:45
  • I have code that produces a PDF. I want to add code that will take the content of an existing pdf and insert that into the PDF that I'm producing. I have a file to read, but I am not producing a file, I'm producing a byte array. – arcy Jan 16 '16 at 20:14

2 Answers2

4

I'm not actively working with the old iText versions anymore but some things have not changed since then. Thus, here some issues in your code and pointers helping to resolve them:

Your main issues in your current code are that you

  • reuse the Document instance (which you already use for your PdfWriter and already have opened) for a PdfCopy; while a Document can support multiple listeners, they all need to be registered before calling open; the use case of this construct is to create the same document in parallel in two different formats; and you

  • use the same output stream for both your PdfWriter and your PdfCopy; the result is not one valid PDF but byte ranges from two different PDFs wildly mixed together, i.e. something that definitely won't be a valid PDF.

Using PdfCopy correctly

You can restructure your code by first creating a new PDF containing you new paragraphs in a ByteArrayOutputStream (closing the Document involved) and then copy this PDF and the other pages you want to add into a new PDF.

E.g. like this:

ByteArrayOutputStream os = new ByteArrayOutputStream();
Document bigDoc = new Document(PageSize.LETTER, 50, 50, 110, 60);
PdfWriter writer = PdfWriter.getInstance(bigDoc, os);
bigDoc.open();
Paragraph par = new Paragraph("one");
bigDoc.add(par);
bigDoc.add(new Paragraph("three"));
bigDoc.close();

ByteArrayOutputStream os2 = new ByteArrayOutputStream();
Document finalDoc = new Document();
PdfCopy copy = new PdfCopy(finalDoc, new FileOutputStream(RESULT2));
finalDoc.open();
PdfReader reader = new PdfReader(os.toByteArray());
for (int i = 0; i < reader.getNumberOfPages();) {
    copy.addPage(copy.getImportedPage(reader, ++i));
}
PdfReader pdfReader = new PdfReader("c:/insertable.pdf");
copy.addPage(copy.getImportedPage(pdfReader, 1));
finalDoc.close();
reader.close();
pdfReader.close();

// result PDF
byte[] result = os2.toByteArray();           

Using only PdfWriter

You can alternatively change your code by directly importing the page into your PdfWriter, e.g. like this:

ByteArrayOutputStream os = new ByteArrayOutputStream();
Document bigDoc = new Document(PageSize.LETTER, 50, 50, 110, 60);
PdfWriter writer = PdfWriter.getInstance(bigDoc, os);
bigDoc.open();
Paragraph par = new Paragraph("one");
bigDoc.add(par);
bigDoc.add(new Paragraph("three"));

PdfReader pdfReader = new PdfReader("c:/insertable.pdf");
PdfImportedPage page = writer.getImportedPage(pdfReader, 1);
bigDoc.newPage();
PdfContentByte canvas = writer.getDirectContent();
canvas.addTemplate(page, 1, 0, 0, 1, 0, 0);

bigDoc.close();
pdfReader.close();

// result PDF
byte[] result = os.toByteArray();           

This approach appears better because no intermediary PDF is required. Unfortunately this appearance is deceiving, this approach as some disadvantages.

Here not the whole original page is copied and added as is to the document but instead only its content stream is used as the content of a template which then is referenced from the actual new document page. This in particular means:

  • If the imported page has different dimensions than your new target document, some parts of it might be cut of while some parts of the new page remain empty. Because of this you will often find variants of the code above which by scaling and rotating try to make the imported page and target page fit.

  • The original page contents are now in a template which is referenced from the new page. If you import this new page into yet another document using the same mechanism, you get a page which references a template which again merely references a template which has the original contents. If you import this page into another document, you get another level of indirectness. Etc. etc..

    Unfortunately conforming PDF viewers only need to support this indirectness to a limited degree. If you continue this process, your page contents suddenly may not be visible anymore. If the original page already brings along its own hierarchy of referenced templates, this may happen sooner rather than later.

  • As only the contents are copied, properties of the original page not in the content stream will be lost. This in particular concerns annotations like form fields or certain types of highlight markings or even certain types of free text.

(By the way, these templates in generic PDF specification lingo are called Form XObjects.)

This answer explicitly deals with the use of PdfCopy and PdfWriter in the context of merging PDFs.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
  • I greatly appreciate you correcting my code; I have copied what you've written and put it into my little example program, and it does more or less what I wanted. The little example also illustrates several things about iText that I just don't get: for example, we start out creating an output stream and a document, add things to the document, and close the document, but the output stream is still available to use later (os in the above). – arcy Jan 18 '16 at 17:11
  • The biggest thing I don't get and have been trying to avoid, however, is that, in order to copy from one document to another, it is seemingly necessary to create a third document. The program I'm working on, for which I created the above example, has a document to which it adds numerous things: images, text, tables, etc. But evidently, in order to add a page from another document, is it really impossible to just read the page and add it? Do I have to create this addtional document, copy all the pages from the first one into it, and THEN add in my other page(s)? – arcy Jan 18 '16 at 17:14
  • IText does not have the *one-class-does-it-all* architecture but instead a number of classes for different tasks. In particular there are *PDF creation from scratch* (`Document` plus `PdfWriter`), *merging pdfs* (`Document` plus a `Pdf*Copy*` variant), and *manipulating a Pdf* (`PdfStamper`). This implies that for mixed use cases you have multiple passes with intermediary documents. – mkl Jan 18 '16 at 18:03
  • Could you possibly meet me in the Java chat room? I am wondering if you have a suggested best practice for building up part of a PDF from scratch, then merging another page into it, then doing some more from-scratch stuff to that document, etc. How do I store the doc in between the operations that iText defines as disjoint? – arcy Jan 18 '16 at 18:13
  • Is there a way to get an imported page just from a reader? Maybe I could open a document for reading, get a page, make it an image, and then add the image to an existing document without haveint to get out of create-from-scratch mode? – arcy Jan 18 '16 at 18:21
  • *Could you possibly meet me in the Java chat room?* - sorry, I'm currently online in a very on/off manner. – mkl Jan 18 '16 at 18:51
  • *Is there a way to get an imported page just from a reader?* - no, an *imported page* is something already ***imported*** into shine target. What you can do, though, is import a page into a `PdfWriter`. This would spare you the intermediary document but the result PDF would be of less quality. – mkl Jan 18 '16 at 18:54
  • Argh! *into shine target*... Into **some** target! Intelligent smart phone keyboards... – mkl Jan 18 '16 at 19:58
  • it could be some shiny target... again, thanks for the help. Am now trying to figure out how to deal with a document we were just keeping around as a Document/OutputStream pair, and now we can't keep that, we have to terminate it in order to add content from another PDF or two. – arcy Jan 18 '16 at 19:59
  • As mentioned above, you can also import to a `PdfWriter`. Then you merely need to extend your *Document/OutputStream pair* into a *Document/OutputStream/PdfWriter triple*. But the result would be not as good as with a `PdfCopy`. – mkl Jan 18 '16 at 20:11
  • Just to let you know -- the phrase "you can also import to a PdfWriter" doesn't tell me enough to be able to try it. I can futz around on the normal trial-and-error basis that's normal for iText, but that phrase doesn't really tell me how to take that option. – arcy Jan 18 '16 at 22:37
  • *doesn't tell me enough to be able to try it.* - cf. my edit. – mkl Jan 19 '16 at 08:39
0

Here's another version of this, incorporating mkl's corrections, hopefully the names will lend themselves to other questions:

import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.PageSize;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfWriter;

public class PdfPlay
{
      public static void main(String[] args) 
      {
          try
          {
              ByteArrayOutputStream outputStream1 = new ByteArrayOutputStream();

              Document document1 = new Document(PageSize.LETTER, 50, 50, 110, 60);
              PdfWriter writer1 = PdfWriter.getInstance(document1, outputStream1);

              document1.open();
              document1.add(new Paragraph("one"));
              document1.add(new Paragraph("two"));
              document1.add(new Paragraph("three"));
              document1.close();

              byte[] withInsert = addPdfPage(outputStream1, "insertable.pdf");

          }
          catch (Exception e)
          {
              e.printStackTrace();
          }
      }

      private static byte[] addPdfPage(ByteArrayOutputStream outputStream1, String insertFilename) 
      {
          try 
          {
            ByteArrayOutputStream outputStream2 = new ByteArrayOutputStream();
            Document document2 = new Document();
            PdfCopy copy = new PdfCopy(document2, new FileOutputStream("inserted.pdf"));
            document2.open();
            PdfReader outputStream1Reader = new PdfReader(outputStream1.toByteArray());
            for (int i=1; i<=outputStream1Reader.getNumberOfPages(); i++)
            {
              copy.addPage(copy.getImportedPage(outputStream1Reader, i));
            }
            PdfReader insertReader = new PdfReader(insertFilename);
            copy.addPage(copy.getImportedPage(insertReader, 1));

            document2.close();
            outputStream1Reader.close();
            insertReader.close();

            byte[] result = outputStream2.toByteArray();
            return result;
          }
          catch (Exception e) 
          {
              System.out.println("Cannot add PDF from PSC: <" + insertFilename + ">: " + e.getMessage());
              e.printStackTrace();
              return null;
          }
      }

}

If run with the file 'insertable.pdf' in the default directory for the program, this program produces the file 'inserted.pdf' in the same directory, with lines of text "one", "two", and "three" on the first page, and the first page of the 'insertable.pdf' on the second page.

So mkl's correction works; to use it in the environment where I want to use it, there are a couple of questions:

I have a program where I want to use this functionality that is a web app, and so does not have ready access to a place to write a file. I'm assuming I can use ByteArrayOutputStream in the place of an output file as done here.

Is it absolutely necessary to create a new output stream in order to insert the content? I was hoping for a way to tell some iText component "here's a file; read its first page and insert it in the document/outputStream/writer I already have open. Pdf documents in memory can get quite large; I would rather not have to copy all the existing PDF structures just so I can add another one to them. If I end up inserting pages from more than one other document, I could have to do that multiple times, I guess...

arcy
  • 12,845
  • 12
  • 58
  • 103