2

We have a requirement where we need to remove annotation on some matched conditional check. PDAnnotaion gets removed when I have executed allPageAnnotationsList.remove(annotationTobeRemoved) statement.

But corresponding text remained displayed in blue color only. How could I update the text color to normal(black)?

Braiam
  • 1
  • 11
  • 47
  • 78
  • 1
    Please share the PDF and the code. The text color is not related to the annotations. The annotation is about having the link and maybe a border. – Tilman Hausherr Jul 08 '21 at 10:08
  • @TilmanHausherr, Thanks for your immediate response. I have uploaded my codes to github https://github.com/sureshbabukatta/pdfbox-remove-annotation test-original.pdf is my source pdf where i have annotaions. test-after-removing-annotation.pdf is the result document after executing my codes. Please update following path according to your local environment. D:\Apps\TestProject\src\main\resources\test.pdf (test-original.pdf) Thanks, Sureshbabu – Sureshbabu Katta Jul 08 '21 at 10:53
  • That file is a test file so a production file may be completely different. In that one, you can search for "rg" in the content stream (use PDFDebugger to see what I mean) and remove that one including the 3 parameters (see the RemoveAllText example, it must be modified) to detect the blue color. – Tilman Hausherr Jul 08 '21 at 11:08
  • @TilmanHausherr , can you provide me few code sample for the same ? – Sureshbabu Katta Jul 08 '21 at 11:15
  • Sorry no, I don't have the time. But RemoveAllText.java (in the examples subproject of the source code download) is a good start to see how to manipulate the content stream. – Tilman Hausherr Jul 08 '21 at 11:26

1 Answers1

2

Originally I thought you asked for all non-black text on a page to be changed to black. This resulted in my original answer, now the first section 'Updating All Text to Black'. Then you clarified that you only wanted the text in the areas of the removed annotations to be made black. That's shown in the second section 'Updating Text in Areas to Black'.

Updating All Text to Black

First of all, as already described by Tilman in comments, removing link annotations usually merely removes the interactivity of that link but the text in the area of the link annotation remains as is. If you want to update the text color to normal(black), therefore, you have to add a second step and manipulate the colors in the static page contents.

The static page content is defined by a stream of instructions which change the graphics state or draw something. The color used for drawing is part of the graphics state and is set by explicit color setting instructions. Thus, one could think you could simply replace all color setting instructions by instructions selecting normal(black).

Unfortunately it's not that easy because colors may be changed to draw other things, too. E.g. in your document at the start the whole page is filled with white; if you replaced the color setting instruction before that fill instruction, your whole page would be black. Not exactly what you want.

To update the text color to normal(black) but not change other colors, therefore, you have to consider the context of instructions you want to change.

The PDFBox parsing framework can help you here, iterating over a content stream and keeping track of the graphics state.

Based upon that framework, furthermore, a generic content stream editor helper class has been created in this answer, the PdfContentStreamEditor. (For details and example uses see that answer.) Now you merely have to customize it for your use case, e.g. like this:

PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
    PdfContentStreamEditor editor = new PdfContentStreamEditor(document, page) {
        @Override
        protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
            String operatorString = operator.getName();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
                if (currentlyReplacedColor == null)
                {
                    PDColor currentFillColor = getGraphicsState().getNonStrokingColor();
                    if (!isBlack(currentFillColor))
                    {
                        currentlyReplacedColor = currentFillColor;
                        super.write(contentStreamWriter, SET_NON_STROKING_GRAY, GRAY_BLACK_VALUES);
                    }
                }
            } else if (currentlyReplacedColor != null) {
                PDColorSpace replacedColorSpace = currentlyReplacedColor.getColorSpace();
                List<COSBase> replacedColorValues = new ArrayList<>();
                for (float f : currentlyReplacedColor.getComponents())
                    replacedColorValues.add(new COSFloat(f));
                if (replacedColorSpace instanceof PDDeviceCMYK)
                    super.write(contentStreamWriter, SET_NON_STROKING_CMYK, replacedColorValues);
                else if (replacedColorSpace instanceof PDDeviceGray)
                    super.write(contentStreamWriter, SET_NON_STROKING_GRAY, replacedColorValues);
                else if (replacedColorSpace instanceof PDDeviceRGB)
                    super.write(contentStreamWriter, SET_NON_STROKING_RGB, replacedColorValues);
                else {
                    //TODO
                }
                currentlyReplacedColor = null;
            }

            super.write(contentStreamWriter, operator, operands);
        }

        PDColor currentlyReplacedColor = null;

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
        final Operator SET_NON_STROKING_CMYK = Operator.getOperator("k");
        final Operator SET_NON_STROKING_RGB = Operator.getOperator("rg");
        final Operator SET_NON_STROKING_GRAY = Operator.getOperator("g");
        final List<COSBase> GRAY_BLACK_VALUES = Arrays.asList(COSInteger.ZERO);
    };
    editor.processPage(page);
}
document.save("withBlackText.pdf");

(ChangeTextColor test testMakeTextBlackTestAfterRemovingAnnotation)

Here we check whether the current instruction is a text drawing instruction. If it is and the current color is not already replaced, we check whether the current color is already black'ish. If it is not black, we store it and add an instruction to replace the current fill color by black.

Otherwise, i.e. if the current instruction is not a text drawing instruction, we check whether the current color has been replaced by black. If it has, we restore the original color.

To check whether a given color is black'ish, we use the following helper method.

static boolean isBlack(PDColor pdColor) {
    PDColorSpace pdColorSpace = pdColor.getColorSpace();
    float[] components = pdColor.getComponents();
    if (pdColorSpace instanceof PDDeviceCMYK)
        return (components[0] > .9f && components[1] > .9f && components[2] > .9f) || components[3] > .9f;
    else if (pdColorSpace instanceof PDDeviceGray)
        return components[0] < .1f;
    else if (pdColorSpace instanceof PDDeviceRGB)
        return components[0] < .1f && components[1] < .1f && components[2] < .1f;
    else
        return false;
}

(ChangeTextColor helper method)

Updating Text in Areas to Black

In comments you clarified that you only want the text in the areas of the removed annotations to become black.

For this you have to collect the rectangles of the annotations you remove and later check the position before switching colors whether it's inside one of those rectangles.

This can be done by extending the code above as follows. Here I remove every other annotation only and collect their rectangles to check against them later. Also I override the PDFStreamEngine method showText(byte[]) to store the position of the text shown in the current text drawing instruction.

PDDocument document = ...;
for (PDPage page : document.getDocumentCatalog().getPages()) {
    List<PDRectangle> areas = new ArrayList<>();
    // Remove every other annotation, collect their areas
    List<PDAnnotation> annotations = new ArrayList<>();
    boolean remove = true;
    for (PDAnnotation annotation : page.getAnnotations()) {
        if (remove)
            areas.add(annotation.getRectangle());
        else
            annotations.add(annotation);
        remove = !remove;
    }
    page.setAnnotations(annotations);

    PdfContentStreamEditor editor = new PdfContentStreamEditor(document, page) {
        @Override
        protected void write(ContentStreamWriter contentStreamWriter, Operator operator, List<COSBase> operands) throws IOException {
            String operatorString = operator.getName();

            if (TEXT_SHOWING_OPERATORS.contains(operatorString) && isInAreas()) {
                if (currentlyReplacedColor == null)
                {
                    PDColor currentFillColor = getGraphicsState().getNonStrokingColor();
                    if (!isBlack(currentFillColor))
                    {
                        currentlyReplacedColor = currentFillColor;
                        super.write(contentStreamWriter, SET_NON_STROKING_GRAY, GRAY_BLACK_VALUES);
                    }
                }
            } else if (currentlyReplacedColor != null) {
                PDColorSpace replacedColorSpace = currentlyReplacedColor.getColorSpace();
                List<COSBase> replacedColorValues = new ArrayList<>();
                for (float f : currentlyReplacedColor.getComponents())
                    replacedColorValues.add(new COSFloat(f));
                if (replacedColorSpace instanceof PDDeviceCMYK)
                    super.write(contentStreamWriter, SET_NON_STROKING_CMYK, replacedColorValues);
                else if (replacedColorSpace instanceof PDDeviceGray)
                    super.write(contentStreamWriter, SET_NON_STROKING_GRAY, replacedColorValues);
                else if (replacedColorSpace instanceof PDDeviceRGB)
                    super.write(contentStreamWriter, SET_NON_STROKING_RGB, replacedColorValues);
                else {
                    //TODO
                }
                currentlyReplacedColor = null;
            }

            super.write(contentStreamWriter, operator, operands);

            before = null;
            after = null;
        }

        PDColor currentlyReplacedColor = null;

        final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
        final Operator SET_NON_STROKING_CMYK = Operator.getOperator("k");
        final Operator SET_NON_STROKING_RGB = Operator.getOperator("rg");
        final Operator SET_NON_STROKING_GRAY = Operator.getOperator("g");
        final List<COSBase> GRAY_BLACK_VALUES = Arrays.asList(COSInteger.ZERO);

        @Override
        protected void showText(byte[] string) throws IOException {
            Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
            if (before == null)
                before = getTextMatrix().multiply(ctm);
            super.showText(string);
            after = getTextMatrix().multiply(ctm);
        }

        Matrix before = null;
        Matrix after = null;

        boolean isInAreas() {
            return isInAreas(before) || isInAreas(after);
        }
        boolean isInAreas(Matrix m) {
            return m != null && areas.stream().anyMatch(rect -> rect.contains(m.getTranslateX(), m.getTranslateY()));
        }
    };
    editor.processPage(page);
}
document.save("WithoutSomeAnnotation-withBlackTextThere.pdf");
mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thanks for your response.I have tried this code block but it coverts all the contents to black that is not the expected case. I have few action uri linked annotations in my src pdf.I have removed couple of annotations on some condition. I need text/content to be in black for these annotations removed contents only. – Sureshbabu Katta Jul 12 '21 at 10:10
  • *"I have tried this code block but it coverts all the contents to black that is not the expected case."* - That's unfortunate, I understood your question so that you wanted to update the color of all text to black. – mkl Jul 12 '21 at 12:41
  • *"I need text/content to be in black for these annotations removed contents only."* - In that case you have to collect the rectangles of the annotations you removed and check the position before switching colors. Please supply a representative example for your task (your shared file obviously wasn't representative), I'll try and see whether that's easy to generalize. – mkl Jul 12 '21 at 12:47
  • Thank you for your quick response.I have uploaded a new example pdf document Test2.pdf at https://github.com/sureshbabukatta/pdfbox-remove-annotation.It has multiple annotaions.Requirement spec here is couple of annotations can be removed sequentially or randomly.Then text or content of the removed annotations rectangular area must be in black color. – Sureshbabu Katta Jul 12 '21 at 14:18
  • could you find time to see the new document provided at https://github.com/sureshbabukatta/pdfbox-remove-annotation ? – Sureshbabu Katta Jul 13 '21 at 13:25
  • 1
    Thank you so much @mkl your latest solution updating text in areas has worked for our scenario. – Sureshbabu Katta Jul 14 '21 at 12:55