0

I have some tricky PDF files contains white text. To don't render it in text stripper, I override ProcessTextPosition method:

private static final int COLOR_WHITE = 16777215;    
@Override
protected void processTextPosition(TextPosition text) {
        PDGraphicsState gs = getGraphicsState();    
        PDColor nonStrokingColor = gs.getNonStrokingColor();
        try {
            if (nonStrokingColor.toRGB() != COLOR_WHITE) {
                super.processTextPosition(text);
            }
        } catch (IOException e) {
            logger.error("Could not convert stroking color to RGB", e);
        }
}

However, sometimes I still need to render such white text- when it's placed on color background. As I understand, this is usually some filled rectangle under it, but I don't know how to handle it in ProcessTextPosition. Is there any way to do that? File example: example. Here "INCOME" is white on green rectangle, also "Huron account" is white text on blue rectangle.

  • You either have to collect all rectangles (or more generically: paths) filled in parallel to the actual text extraction and check whether the font rendering color(s) (depending on the text rendering mode it may also be the `StrokingColor`!) of the currently inspected text coincide with that of the currently top filled path at the location of the text. But beware, you also have to consider blend modes and transparency groups etc. pp. for a complete solution! – mkl Mar 20 '18 at 11:47
  • Thanks. I already collecting rectangles - override AppendRectangleToPath, but don't know how to check if it's filled? – D.F. Stones Mar 20 '18 at 12:11
  • 1
    If its followed by `CloseFillNonZeroAndStrokePath`, `CloseFillEvenOddAndStrokePath` `FillNonZeroAndStrokePath`, `FillEvenOddAndStrokePath`, `LegacyFillNonZeroRule`, `FillNonZeroRule`, or `FillEvenOddRule` yes (but consider the whole path, not merely the single rectangle!).If followed by `CloseAndStrokePath`, `StrokePath`, or `EndPath` no. – mkl Mar 20 '18 at 12:21
  • I see, here i can get linePath.getBounds(), and somehow I need to get that Fill color? – D.F. Stones Mar 20 '18 at 14:12
  • 1
    Strictly speaking: At the time the `*Fill*` comes retrieve the `NonStrokingColor`. – mkl Mar 20 '18 at 14:23

0 Answers0