Manipulate PDF objects

Question

I am trying to manipulate a PDF, it functions as a template. What I am trying is replacing 'placeholders' in the PDF template with my data. So someone makes a PDF template in Scribus for example, and adds an empty image with the name "company_logo". My application sees an image placeholder with the name "company_logo" and it adds the company logo there.

I can browse AcroFields with iTextSharp library and set text in a text field (for example) but AcroFields doesn't list the image placeholder. I've got the feeling that AcroFields is not what I am looking for.

So how can I get a list (or tree) of all objects from the PDF and read their properties (like position, size, contents, etc).

P.S. I do not necessarily need to use iTextSharp, any other PDF lib will do as well. Preferably free.

A little pseudo code to make myself more clear

var object = Pdf.GetObjectById("company_logo");
object.SetValue(myImage);
object.SetPosition(x, y);

First, iTextSharp isn't free, it is open source and there's a really big difference. Second, you are correct that `AcroFields` are not the path you should be going down. Third, if you are thinking of PDFs as templates you are in for some trouble. That all said, read through [the answer here](http://stackoverflow.com/a/8751517/231316) for an (incomplete) example of where to start — Chris Haas, May 06 '15 at 14:10
Your pseudo code reveals a lack of understanding of PDF: the position of an image is **never stored in the image**. If it were, that would mean that you couldn't reuse an Image XObject. This 1-minute video explains when free/open source software can be used *for free* and when a commercial license is needed: https://www.youtube.com/watch?v=QHF3xcWnSD4 — Bruno Lowagie, May 06 '15 at 14:51
Images in PDF files don't necessarily have names or ids. Can you explain how Scribus injects those names into the PDF? — mkl, May 06 '15 at 14:57

score 1 · Accepted Answer · edited May 23 '17 at 12:23

From your pseudo-code example, we understand that you want to replace the stream of an object that contains an image. There are several examples on how to do this.

For instance, in the SpecialID example, we create a PDF where we mark a specific image with a special ID. In the ResizeImage example, we track that image based on that special ID and we replace the stream:

object = reader.getPdfObject(i);
if (object == null || !object.isStream())
    continue;
stream = (PRStream)object;
if (value.equals(stream.get(key))) {
    PdfImageObject image = new PdfImageObject(stream);
    BufferedImage bi = image.getBufferedImage();
    if (bi == null) continue;
    int width = (int)(bi.getWidth() * FACTOR);
    int height = (int)(bi.getHeight() * FACTOR);
    BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_INT_RGB);
    AffineTransform at = AffineTransform.getScaleInstance(FACTOR, FACTOR);
    Graphics2D g = img.createGraphics();
    g.drawRenderedImage(bi, at);
    ByteArrayOutputStream imgBytes = new ByteArrayOutputStream();
    ImageIO.write(img, "JPG", imgBytes);
    stream.clear();
    stream.setData(imgBytes.toByteArray(), false, PRStream.NO_COMPRESSION);
    stream.put(PdfName.TYPE, PdfName.XOBJECT);
    stream.put(PdfName.SUBTYPE, PdfName.IMAGE);
    stream.put(key, value);
    stream.put(PdfName.FILTER, PdfName.DCTDECODE);
    stream.put(PdfName.WIDTH, new PdfNumber(width));
    stream.put(PdfName.HEIGHT, new PdfNumber(height));
    stream.put(PdfName.BITSPERCOMPONENT, new PdfNumber(8));
    stream.put(PdfName.COLORSPACE, PdfName.DEVICERGB);
}

You will find another example in the book The Best iText Questions on StackOverflow where I answered the following question: PDF Convert to Black And White PNGs

I wrote the ReplaceImage example to show how to replace the image:

public static void replaceStream(PRStream orig, PdfStream stream) throws IOException {
    orig.clear();
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    stream.writeContent(baos);
    orig.setData(baos.toByteArray(), false);
    for (PdfName name : stream.getKeys()) {
        orig.put(name, stream.get(name));
    }
}

As you can see, it isn't as trivial as saying:

var object = Pdf.GetObjectById("company_logo");
object.SetValue(myImage);

As I explained in my comment, this doesn't make sense:

object.SetPosition(x, y);

The objects we're manipulating are streams that are used as Image XObjects. The advantage of having Image XObjects is that they can be reused. For instance: if you have the same logo on every page, then you want to store the bytes of that image only once and reuse the same logo multiple times. This means that the object with the image bytes doesn't know anything about its position. The position is determined in the content stream. It depends on the CTM.

score 0 · Answer 2 · answered Oct 29 '15 at 16:34

Did you have a look at the scribus scripting capabilities? Since you create a tamplate in scribus You could also write a short python script which replaces your placeholders with your final data and exports the final PDF.

Since scribus 1.5 it is also possible to call the python scripts from the commandline.

Manipulate PDF objects

2 Answers2