1

I am using below code to remove blue colors from pdf text. It is working fine. But it is not changing underlines color, but changing text color correctly.

original file part:

Original File

Manipulated File:

Manipulated File

As you see in above manipulated file, underline color didn't change.

I am looking fix for this thing since two weeks, can anyone help on this. Below is my change color code:

public void testChangeBlackTextToGreenDocument(String source, String filename) throws IOException {
    try (InputStream resource = getClass().getResourceAsStream(source);
            PdfReader pdfReader = new PdfReader(source);
            OutputStream result = new FileOutputStream(filename);
            PdfWriter pdfWriter = new PdfWriter(result);
            PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter);) {
        PdfCanvasEditor editor = new PdfCanvasEditor() {

            @Override
            protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands) {
                
                String operatorString = operator.toString();

                if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
                    List<PdfObject> listobj = new ArrayList<>();
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfLiteral("rg"));
                    if (currentlyReplacedBlack == null) {
                        Color currentFillColor =getGraphicsState().getFillColor();
                        if (ColorConstants.GREEN.equals(currentFillColor) || ColorConstants.CYAN.equals(currentFillColor) || ColorConstants.BLUE.equals(currentFillColor)) {
                            currentlyReplacedBlack = currentFillColor;
                            super.write(processor, new PdfLiteral("rg"), listobj);
                        }
                    }
                } else if (currentlyReplacedBlack != null) {
                    if (currentlyReplacedBlack instanceof DeviceCmyk) {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("k"));
                        super.write(processor, new PdfLiteral("k"), listobj);
                    } else if (currentlyReplacedBlack instanceof DeviceGray) {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("g"));
                        super.write(processor, new PdfLiteral("g"), listobj);
                    } else {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("rg"));
                        super.write(processor, new PdfLiteral("rg"), listobj);
                    }
                    currentlyReplacedBlack = null;
                }

                super.write(processor, operator, operands);
            }

            Color currentlyReplacedBlack = null;

            final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
        };
        for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
            editor.editPage(pdfDocument, i);
        }
    }
    File file = new File(source);
    file.delete();
}

Here is the original file. https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/originalFile.pdf

Related Links:

Traverse whole PDF and change some attribute with some object in it using iText

Removing Watermark from PDF iTextSharp

Maven Dependcy Details:

        <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itext7-core</artifactId>
        <version>7.1.5</version>
        <type>pom</type>
    </dependency>
    
    <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itextpdf</artifactId>
        <version>5.0.6</version>
    </dependency>

Edited:

Accepted answer is not working for below files:

https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf (Page 41)

https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf (Page 60).

Please Help.

Community
  • 1
  • 1
Asad Rao
  • 3,190
  • 1
  • 22
  • 26
  • Or can anyone provide me one code, in which i just provide this type of files and it changes the color of hyperlinks all in it from blue to black, and remove underline also. Thanks. – Asad Rao Sep 18 '19 at 13:33
  • The answer for which your code was created refers to a question which is about changing the color of text, not of lines. Thus, your observation is to be expected. The code is easy to change if everything in that special blue is to be replaced by black; in general, though, there is no difference between text underlines and arbitrary line graphics. thus, only to recolor blue underlines but not other blue lines requires more specific knowledge. – mkl Sep 18 '19 at 18:54
  • You are right @mkl, but it is not removing lines, i have tried all the possible solutions. – Asad Rao Sep 19 '19 at 13:33
  • I see you use code for iText 7.x; on the other hand you did neither tag your question [tag:itext7] nor mentioned the version in your question. So are you looking for an iText 7.x solution? Or an iText 5.x solution? Or would either one be ok? – mkl Sep 19 '19 at 14:21

1 Answers1

1

(The example code here uses iText 7 for Java. You mentioned neither the iText version nor your programming environment in tags or question text but your example code appears to indicate that this is your combination of choice.)

Replacing blue fill colors

The test you based your original code on attempts explicitly only to change text color. The "underline" in your document, though, is (as far as PDF drawing is concerned) not part of the text but instead drawn as a simple path. Thus, the underline explicitly is not touched by the original code and it has to be adapted for your task.

But actually your task, changing everything blue to black, is easier to implement than only changing the blue text, e.g.

try (   PdfReader pdfReader = new PdfReader(SOURCE_PDF);
        PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {
        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                    return;
                }
            }
            
            super.write(processor, operator, operands);
        }

        boolean isApproximatelyEqual(PdfObject number, float reference) {
            return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
        }

        final String SET_FILL_RGB = "rg";
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(ChangeColor test testChangeFillRgbBlueToBlack)

Beware, this is merely a proof-of-concept, not a final and complete solution. In particular:

  • It merely looks at the fill (non-stroking) colors. In your case that suffices as both your text (as usual) and your underline use fill colors only - the underline actually is not drawn as a stroked line but instead as a slim, filled rectangle.
  • Only RGB blue (and only such blue set using the rg instruction, not set using sc or scn, let alone blues combined out of other colors using funky blend modes) is considered. This might be an issue particularly in case of documents explicitly designed for printing (likely using CMYK colors).
  • PdfCanvasEditor only inspects and edits the content stream of the page itself, not the content streams of displayed form XObjects or patterns; thus, some content may not be found. It can be generalized fairly easily.

The result:

screen shot

Replacing blue fill and stroke colors

Testing the code above you soon found documents in which the underlines were not changed. As it turned out, these underlines are actually drawn as stroked lines, not as filled rectangle as above.

To also properly edit such documents, therefore, you must not only edit the fill colors but also the stroke colors, e.g. like this:

try (   PdfReader pdfReader = new PdfReader(SOURCE_PDF);
        PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {
        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                    return;
                }
            }

            if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
                    return;
                }
            }

            super.write(processor, operator, operands);
        }

        boolean isApproximatelyEqual(PdfObject number, float reference) {
            return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
        }

        final String SET_FILL_RGB = "rg";
        final String SET_STROKE_RGB = "RG";
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(ChangeColor tests testChangeRgbBlueToBlackControlOfNitrosamineImpuritiesInSartansRev and testChangeRgbBlueToBlackEdqmReportsIssuesOfNonComplianceWithToothMac)

The results:

Control_of_nitrosamine_impurities_in_sartans__rev.pdf

and

EDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdf

Replacing different shades of blue from other RGB'ish color spaces

Testing the code above you again found documents in which the blue colors were not changed. As it turned out, these blue colors were not from the DeviceRGB standard RGB but instead from ICCBased colorspaces, profiled RGB color spaces to be more exact. In particular other color setting operators were used than before, sc / scn instead of rg. Furthermore, in one document not a pure blue 0 0 1 but instead a .17255 .3098 .63529 blue was used

If we assume that sc and scn instructions with three numeric arguments set some flavor of RGB colors as here (in general this is an oversimplification, Lab and other color spaces can also come with 4 components, but your documents seem RGB oriented) and are less strict in recognizing the blue color, we can generalize the code above as follows:

class AllRgbBlueToBlackConverter extends PdfCanvasEditor {
    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (RGB_SETTER_CANDIDATES.contains(operatorString) && operands.size() == 4) {
            if (isBlue(operands.get(0), operands.get(1), operands.get(2))) {
                PdfNumber number0 = new PdfNumber(0);
                operands.set(0, number0);
                operands.set(1, number0);
                operands.set(2, number0);
            }
        }

        super.write(processor, operator, operands);
    }

    boolean isBlue(PdfObject red, PdfObject green, PdfObject blue) {
        if (red instanceof PdfNumber && green instanceof PdfNumber && blue instanceof PdfNumber) {
            float r = ((PdfNumber)red).floatValue();
            float g = ((PdfNumber)green).floatValue();
            float b = ((PdfNumber)blue).floatValue();
            return b > .5f && r < .9f*b && g < .9f*b;
        }
        return false;
    }

    final Set<String> RGB_SETTER_CANDIDATES = new HashSet<>(Arrays.asList("rg", "RG", "sc", "SC", "scn", "SCN"));
}

(ChangeColor helper class)

Used like this

try (   PdfReader pdfReader = new PdfReader(INPUT);
        PdfWriter pdfWriter = new PdfWriter(OUTPUT);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) ) {
    PdfCanvasEditor editor = new AllRgbBlueToBlackConverter();
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

we get

021549Orig1s025_aprepitant_clinpharm_prea_Mac-AllRgbBlueToBlack.pdf

and

400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac-AllRgbBlueToBlack.pdf

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
  • you saved me. it is working perfectly for above file. But I have to made some general soultion, it is not working for some files. Here are the links. Can you please look into that. I shall be very thankful to you. Thanks for the reply. Here are the files for which this code is not working properly https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/Control_of_nitrosamine_impurities_in_sartans__rev.pdf https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/EDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdf – Asad Rao Sep 20 '19 at 08:22
  • I have added dependency details above. Thanks – Asad Rao Sep 20 '19 at 08:31
  • In contrast to the first document, in your two new documents the underlines actually are drawn as stroked lines, not as a filled rectangle as above. The code is easy to make work for them, too. Concerning the dependencies: For the example code in my answer, you only need the iText 7 dependencies. – mkl Sep 20 '19 at 08:48
  • Can you please guide me, how to change that? I will change my dependencies if it will not work for my dependencies. – Asad Rao Sep 20 '19 at 09:17
  • @AsadRao See the new section "Replacing blue fill and stroke colors" of my answer. – mkl Sep 20 '19 at 09:37
  • @mk man, You are genius. It is working perfectly fine. Very Very Thanks. Just a side question, can we remove underlines? – Asad Rao Sep 20 '19 at 09:42
  • Well, it is easy to make the editor class above remove vector graphics by replacing fill or stroke instructions by instructions dropping the current path without drawing it. If only doing so in case of the applicable current color being blue, that would likely do the job in case of your example PDFs. But beware, in documents with other graphics with blue elements (e.g. logos), these would be mutilated, too. – mkl Sep 20 '19 at 10:04
  • Fine, Got it. Can you share the sample for that as well, it would be really helpful for me. Thanks a lot. – Asad Rao Sep 20 '19 at 10:11
  • Later probably. – mkl Sep 20 '19 at 10:16
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199724/discussion-between-asad-rao-and-mkl). – Asad Rao Sep 20 '19 at 10:22
  • I have added two new files in question description. Code is not working for that, can you please help me on it. – Asad Rao Sep 23 '19 at 13:02
  • 1
    @AsadRao Those two files are examples of the restriction "only such blue set using the **rg** instruction, not set using **sc** or **scn**" I warned about in my answer. Those blues are not from the **DeviceRGB** standard RGB but instead from ICCBased colorspaces. You could, for a work-around, assume all **sc** and **scn** instructions with three numeric parameters to essentially set RGB colors and handle them as such. But this is an assumption that will eventually fail for some documents. Alternatively you can start parsing ICC profiles. – mkl Sep 23 '19 at 14:24
  • @AsadRao Furthermore, the second new document does not use a clear blue but one built for the parameters `.17255 .3098 .63529`. Thus, you additionally have to soften your "blue recognition" code from checking *is approximately `0 0 1`* to something like *is `r g b` with `r+g≪b`*. – mkl Sep 23 '19 at 14:31
  • @AsadRao Hhmmm, it's probably not as easy as `r+g≪b`, see this answer: https://stackoverflow.com/a/17670830/1729265 - essentially it switches to HSV colors and even there can only give approximate ranges; finally it goes back to RGB and proposes `if( max( red, green, blue) == blue)`... – mkl Sep 23 '19 at 14:53
  • Right, so what would be this value then " (color.equals(getGraphicsState().getFillColor()))" – Asad Rao Sep 23 '19 at 15:01
  • 1
    @AsadRao I'm not really an expert on color space transformations etc. For your two example documents it is easy to adapt the code here (by assuming three number cs and csn instructions to be RGB and to soften the blue-test). But you should first ask the person who assigned you these tasks (changing blue to black / deleting blue lines) which kinds of blue and ways to construct the blues your code is expected to support. Because if the person says "all" you can reply that it likely will take many months or years of work. – mkl Sep 23 '19 at 15:15
  • The task is to just remove underline from hyperlinks and change blue looking text to black. – Asad Rao Sep 23 '19 at 15:37
  • But what is "blue looking text"? Your program has no eyes... For your two example documents it is easy to adapt the code here, but then you'll likely find yet more variants of blue. – mkl Sep 23 '19 at 15:48
  • in most of the cases, that would be these types of blues, that exist in above files. we can ignore other blues, or i will add more blues time to time if needed, but for now i want the solution for these above three blues. – Asad Rao Sep 23 '19 at 15:54
  • 1
    I'll look into that tomorrow. – mkl Sep 23 '19 at 18:43
  • have you get a chance to look into that. – Asad Rao Sep 25 '19 at 08:17
  • *"have you get a chance to look into that."* - stack overflow should have informed you about an edit to my answer 19 hours ago. – mkl Sep 25 '19 at 08:42
  • Thanks for your answer. Very thanks for your answer. But as you already said that it will cause problem, and might be the case, it removes clolor from logos and graphs. I have run the code on file which has graph, and it changes the blue color of that as well, which is not good. Anyway bundle of thanks @mkl. Just last thing, I came to conclusion that i will change the color of the text only if it starts with https or http. Because my goal is to remove text of links. Can u please last time modify code in a way in which it removes underline from text (which contains https or http) only. – Asad Rao Sep 25 '19 at 14:35