0

I have successfully changed the color of underlines using below link code. Can anyone help me how to remove underlines from PDF, the underlines i have find using below link code.

Traverse whole PDF and change blue color to black ( Change color of underlines as well) + iText

Below is my code that are finding hyperlinks and changing their colors to black. I have to modify this code to remove those underlines.

PdfCanvasEditor editor = new PdfCanvasEditor() {
    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
            if (isApproximatelyEqual(operands.get(0), 0) &&
                    isApproximatelyEqual(operands.get(1), 0) &&
                    isApproximatelyEqual(operands.get(2), 1)) {
                super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                return;
            }
        }

        if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
            if (isApproximatelyEqual(operands.get(0), 0) &&
                    isApproximatelyEqual(operands.get(1), 0) &&
                    isApproximatelyEqual(operands.get(2), 1)) {
                super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
                return;
            }
        }

        super.write(processor, operator, operands);
    }

    boolean isApproximatelyEqual(PdfObject number, float reference) {
        return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
    }

    final String SET_FILL_RGB = "rg";
    final String SET_STROKE_RGB = "RG";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
    editor.editPage(pdfDocument, i);
}

Edited:

Accepted answer is not working for below files:

https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf (Page 41)

https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf (Page 60).

Please Help.

Asad Rao
  • 3,190
  • 1
  • 22
  • 26

1 Answers1

2

As described in a comment in the context of the referenced question

it is easy to make the editor class above remove vector graphics by replacing fill or stroke instructions by instructions dropping the current path without drawing it. If only doing so in case of the applicable current color being blue, that would likely do the job in case of your example PDFs. But beware, in documents with other graphics with blue elements (e.g. logos), these would be mutilated, too.

This is what the following content editor does:

class PdfGraphicsRemoverByColor extends PdfCanvasEditor {
    public PdfGraphicsRemoverByColor(Color color) {
        this.color = color;
    }

    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (color.equals(getGraphicsState().getFillColor())) {
            switch (operatorString) {
            case "f":
            case "f*":
            case "F":
                operatorString = "n";
                break;
            case "b":
            case "b*":
                operatorString = "s";
                break;
            case "B":
            case "B*":
                operatorString = "S";
                break;
            }
        }

        if (color.equals(getGraphicsState().getStrokeColor())) {
            switch (operatorString) {
            case "s":
            case "S":
                operatorString = "n";
                break;
            case "b":
            case "B":
                operatorString = "f";
                break;
            case "b*":
            case "B*":
                operatorString = "f*";
                break;
            }
        }

        operator = new PdfLiteral(operatorString);
        operands.set(operands.size() - 1, operator);
        super.write(processor, operator, operands);
    }

    final Color color;
}

(RemoveGraphicsByColor helper class)

Applied like this:

try (   PdfReader pdfReader = new PdfReader(INPUT);
        PdfWriter pdfWriter = new PdfWriter(OUTPUT);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfGraphicsRemoverByColor(ColorConstants.BLUE);
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(RemoveGraphicsByColor tests)

to the example files Control_of_nitrosamine_impurities_in_sartans__rev.pdf, EDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdf, and originalFile.pdf from the referenced question, one gets:

Control_of_nitrosamine_impurities_in_sartans__rev.pdf

and

EDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdf

and

originalFile.pdf

Beware, this is merely a proof-of-concept, not a final and complete solution. In particular:

  • Only RGB blue is considered. This might be an issue particularly in case of documents explicitly designed for printing (likely using CMYK colors).

  • All path fills and strokes are dropped as long as they were blue. Depending on your documents this may have to be filtered.

  • PdfCanvasEditor only inspects and edits the content stream of the page itself, not the content streams of displayed form XObjects or patterns; thus, some content may not be found. It can be generalized fairly easily.

Different shades of blue from other RGB'ish color spaces

Testing the code above you found documents in which the blue lines were not removed. As it turned out, these blue colors were not from the DeviceRGB standard RGB but instead from ICCBased colorspaces, profiled RGB color spaces to be more exact. Furthermore, in one document not a pure blue 0 0 1 but instead a .17255 .3098 .63529 blue was used.

To also be able to deal with these documents, the approach above must be generalized; e.g. we can use a Predicate<Color> instead of a single, specific Color, e.g. like this:

class PdfGraphicsRemoverByColorPredicate extends PdfCanvasEditor {
    public PdfGraphicsRemoverByColorPredicate(Predicate<Color> colorPredicate) {
        this.colorPredicate = colorPredicate;
    }

    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (colorPredicate.test(getGraphicsState().getFillColor())) {
            switch (operatorString) {
            case "f":
            case "f*":
            case "F":
                operatorString = "n";
                break;
            case "b":
            case "b*":
                operatorString = "s";
                break;
            case "B":
            case "B*":
                operatorString = "S";
                break;
            }
        }

        if (colorPredicate.test(getGraphicsState().getStrokeColor())) {
            switch (operatorString) {
            case "s":
            case "S":
                operatorString = "n";
                break;
            case "b":
            case "B":
                operatorString = "f";
                break;
            case "b*":
            case "B*":
                operatorString = "f*";
                break;
            }
        }

        operator = new PdfLiteral(operatorString);
        operands.set(operands.size() - 1, operator);
        super.write(processor, operator, operands);
    }

    final Predicate<Color> colorPredicate;
}

(RemoveGraphicsByColor helper class)

Applied like this:

try (   PdfReader pdfReader = new PdfReader(INPUT);
        PdfWriter pdfWriter = new PdfWriter(OUTPUT);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfGraphicsRemoverByColorPredicate(RemoveGraphicsByColor::isRgbBlue);
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(RemoveGraphicsByColor testRemoveAllBlueLinesFrom* tests)

to the new example files using this predicate method

public static boolean isRgbBlue(Color color) {
    if (color instanceof CalRgb || color instanceof DeviceRgb || (color instanceof IccBased && color.getNumberOfComponents() == 3)) {
        float[] components = color.getColorValue();
        float r = components[0];
        float g = components[1];
        float b = components[2];
        return b > .5f && r < .9f*b && g < .9f*b;
    }
    return false;
}

(RemoveGraphicsByColor helper method)

one gets

021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf

and

400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf

Beware, the warnings from above still apply.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
  • @AsadRao Great! In that case you should probably mark the answer as "accepted" (click the tick at its upper left, right under the voting arrows). – mkl Sep 23 '19 at 12:31
  • I have added two new files in question description. Code is not working for that, can you please help me on it. – Asad Rao Sep 23 '19 at 13:02
  • @AsadRao As explained in comments in the context of the referenced question, the two new files don't use **DeviceRGB** but instead some ICCBased color spaces. Furthermore, the second document does not use a clear blue but instead `.17255 .3098 .63529`. Thus, the test `color.equals(getGraphicsState().getFillColor())` needs to be considerably generalized (to be able to compare across color spaces) and softened (to recognize a dirty blue, too). – mkl Sep 23 '19 at 14:40
  • So what should be the if statement then? – Asad Rao Sep 23 '19 at 15:10
  • @AsadRao Have you asked your project manager which blues to support? – mkl Sep 23 '19 at 15:47
  • in most of the cases, that would be these types of blues, that exist in above files. we can ignore other blues, or i will add more blues time to time if needed, but for now i want the solution for these above three blues. – Asad Rao Sep 23 '19 at 15:54
  • 1
    Ok. I'll look into that tomorrow. – mkl Sep 23 '19 at 18:42
  • Thanks for your answer. Very thanks for your answer. But as you already said that it will cause problem, and might be the case, it removes clolor from logos and graphs. I have run the code on file which has graph, and it changes the blue color of that as well, which is not good. Anyway bundle of thanks @mkl. Just last thing, I came to conclusion that i will change the color of the text only if it starts with https or http. Because my goal is to change color of links. Can u please last time modify code in a way in which it change colors of text (which contains https or http) only. – Asad Rao Sep 25 '19 at 14:35
  • In ColorPredicate, can we just mentioned shades of blue only. Want to minimize the range of isRgbBlue(Color color) . It is converting even purple color also. – Asad Rao Oct 30 '19 at 14:23
  • 1
    *"can we just mentioned shades of blue only"* - you can try to be more restrictive, e.g. replace `r < .9f*b && g < .9f*b` by `r < .75f*b && g < .75f*b`, But there are hardly any fixed limits on RGB blueness, see [this answer](https://stackoverflow.com/a/17670830/1729265) which essentially switches to HSV colors and even there can only give approximate ranges; finally that answer goes back to RGB and proposes `if( max( red, green, blue) == blue)` which is even less strict than the `b > .5f && r < .9f*b && g < .9f*b` above and so categorizes even more colors as blue... – mkl Oct 30 '19 at 14:41
  • Thanks, I have narrow down the value. And this value seems ok to me: return b > .5f && r < .513f*b && g < .513f*b; – Asad Rao Oct 31 '19 at 10:55
  • One thing i noticed. Our code is removing red dotted lines as well. Why? See links: [Original File] https://raad-dev-test.s3.ap-south-1.amazonaws.com/raw/WebScraping/RegulatoryInformation/dev/North+America/USA/2019-10-31/2841/_FDA_Requires_Use_of_eCTD_Format_and_Standardized_Study_Data_in_Future_Regulatory_Submissions__Sept.pdf [Manipulated File] https://raad-dev-test.s3.ap-south-1.amazonaws.com/raw/WebScraping/RegulatoryInformation/dev/North+America/USA/2019-10-31/2841/_FDA_Requires_Use_of_eCTD_Format_and_Standardized_Study_Data_in_Future_Regulatory_Submissions__Sept_Mac.pdf – Asad Rao Oct 31 '19 at 10:57
  • *"Our code is removing ..."* - I applied the `PdfGraphicsRemoverByColorPredicate` with the `RemoveGraphicsByColor::isRgbBlue` from my answer to your [_FDA_Requires_Use_of_eCTD_Format_and_Standardized_Study_Data_in_Future_Regulatory_Submissions__Sept.pdf](https://raad-dev-test.s3.ap-south-1.amazonaws.com/raw/WebScraping/RegulatoryInformation/dev/North+America/USA/2019-10-31/2841/_FDA_Requires_Use_of_eCTD_Format_and_Standardized_Study_Data_in_Future_Regulatory_Submissions__Sept.pdf) and the red dotted line did not vanish. Thus, please explain exactly how to reproduce that issue. – mkl Oct 31 '19 at 15:55