TextMarginFinder to verify printability

Question

I am attempting to use TextMarginFinder to prove that odd and even pages back up correctly when printing. I have based my code on: http://itextpdf.com/examples/iia.php?id=280

The issue I have is that on odd pages I am looking for the box to be aligned to the left showing a 1CM back margin for example, and on an even page I would expect the page box to be aligned to the right also showing a 1CM back margin. Even in the example above this is not the case, but when printed the text does back up perfectly because the Trim Box conforms.

In summary I believe on certain PDF files the TextMarginFinder is incorrectly locating the text width, usually on Even pages. This is evident by the width being greater than the actual text. This is usually the case if there are slug marks outside of the Media Box area.

But indeed, if there is text in the slug area, it will be considered as text by the TextMarginFinder. — mkl, Aug 05 '14 at 06:38
The example from the URL above perfectly illustrates the issue. Please see the right side of page xix where the box is not flush with the text, where as the left perfectly aligns to the edge of the characters. — user3909072, Aug 06 '14 at 06:30
*Please see the right side of page xix where the box is not flush with the text* - it is! If you look into the PDF content you'll see that many of the lines have a trailing space character. These trailing space characters are part of the text. If you don't want those space characters to count, you'll have to adapt the code a bit. — mkl, Aug 06 '14 at 08:15
That's very helpful thank you! It must be generated by certain applications when producing the PDF as I have many documents that do conform. Would it be possible for you to suggest a way I could adapt the code to overcome this, something like detecting if the line ends have spaces and reducing the rectangle by a space width? — user3909072, Aug 06 '14 at 20:00

score 0 · Accepted Answer · answered Aug 07 '14 at 08:51

In the PDF the OP pointed to (margins.pdf from the iText samples themselves) indeed the box is not flush with the text:

enter image description here

If you look into the PDF Content, though, you'll see that many of the lines have a trailing space character, e.g. the first line:

(s I have worn out since I started my ) Tj

These trailing space characters are part of the text and, therefore, the box does not flush with the visible text but it does with the text including such space characters.

If you want to ignore such space characters, you can try doing so by filtering such trailing spaces (or for the sake of simplicity all spaces) before they get fed into the TextMarginFinder. To do this I'd explode the TextRenderInfo instances character-wise and then filter those which trim to empty strings.

A helper class to explode the render info objects:

import com.itextpdf.text.pdf.parser.ImageRenderInfo;
import com.itextpdf.text.pdf.parser.RenderListener;
import com.itextpdf.text.pdf.parser.TextRenderInfo;

public class TextRenderInfoSplitter implements RenderListener
{
    public TextRenderInfoSplitter(RenderListener strategy) {
        this.strategy = strategy;
    }

    public void renderText(TextRenderInfo renderInfo) {
        for (TextRenderInfo info : renderInfo.getCharacterRenderInfos()) {
            strategy.renderText(info);
        }
    }

    public void beginTextBlock() {
        strategy.beginTextBlock();
    }

    public void endTextBlock() {
        strategy.endTextBlock();
    }

    public void renderImage(ImageRenderInfo renderInfo) {
        strategy.renderImage(renderInfo);
    }

    final RenderListener strategy;
}

Using this helper you can update the iText sample like this:

RenderFilter spaceFilter = new RenderFilter() {
    public boolean allowText(TextRenderInfo renderInfo) {
        return renderInfo != null && renderInfo.getText().trim().length() > 0;
    }
};

PdfReader reader = new PdfReader(src);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
    TextMarginFinder finder = new TextMarginFinder();
    FilteredRenderListener filtered = new FilteredRenderListener(finder, spaceFilter);
    parser.processContent(i, new TextRenderInfoSplitter(filtered));
    PdfContentByte cb = stamper.getOverContent(i);
    cb.rectangle(finder.getLlx(), finder.getLly(), finder.getWidth(), finder.getHeight());
    cb.stroke();
}
stamper.close();
reader.close();

The result:

enter image description here

In case of slug area text etc you might want to filter more, e.g. anything outside the crop box.

Beware, though, there might be fonts in which the space character is not invisible, e.g. a font of boxed characters. Taking the spaces out of the equation in that case would be wrong.

Thanks again, I can confirm I have tried this approach and it works on all of my PDF files so far. — user3909072, Aug 07 '14 at 19:01

TextMarginFinder to verify printability

1 Answers1

Linked