5

Update: 2021-01-15 - Added Bounty

I am trying to alter the redaction annotation to change the underlying text that gets burned into a PDF when you apply redactions. In Acrobat, you can set up a collection of "redaction codes" that can be used to identify why you are marking something as redacted. My goal is to overwrite what was selected by the user with a system defined value. The code will be ran prior to the redactions being applied.

In my attempts, I have discovered that the "preview" that is available in Acrobat products when hovering your cursor over a redact box is unique to Acrobat, and most other viewers won't show the preview. It also seems like the preview is maintained separately from the actual redaction that is applied. I don't need to alter the text that is shown in the preview, just what is shown after redactions are applied.

I have added a bounty of 150 reputation, as I don't think that I will be able to work out a solution on my own. My original question specified iText7, as that was the library that got me the closest in my own attempts. While I would prefer to use iText7, I will also consider solutions using other libraries that I can reasonably access (I do have a small budget that I could use to purchase another library, if I need to).

I've kept my original question and the follow-up with what I've personally tried below. I appreciate any help offered.

If you need a sample to test with, this DropBox folder has a file called 01 - Original.pdf that you can use as the source document. The desired result is to be able to change the text that appears when applying redactions from "Original Overlay Text" to any other value, such as "New Text".

Original Question:

I am trying to alter the text contained within every redaction annotation in a PDF, using iText7. The PdfRedactAnnotation object has a method called SetOverlayText() that looks like it should do what I want. So, I wrote a method that opens a PDF, loops through the pages, then loops through the annotations on each page, and checks if an annotation is a PdfRedactAnnotation. If it is, it calls SetOverlayText().

When debugging and looking at the annotation properties, I can see that the OverlayText has definitely changed. When I open the file and check the overlay text by hovering over a redaction marking with my cursor, however, the original overlay text is still there.

enter image description here

Additionally, if I apply the redactions, the original overlay text is what gets burned into the page.

However, when I right-click on the annotation (before applying redactions), the overlay text immediately gets updated to the new text:

enter image description here

At this point, when I apply redactions, it's the new text that is burned into the PDF.

Is there any way that I can trigger the Redaction Annotation update programmatically, without having to open and right-click on every one? I've included my code below. Thank you for any advice anyone might be able to offer.

PdfDocument pdfDoc = new PdfDocument(new PdfReader(@"C:\temp\Test - Original.pdf"), new PdfWriter(@"C:\temp\Test - Output.pdf"));
Document doc = new Document(pdfDoc);
int pageCount = pdfDoc.GetNumberOfPages();
for (int i = 1; i <= pageCount; i++)
{
    var annotations = pdfDoc.GetPage(i).GetAnnotations();
    foreach(var annotation in annotations)
    {
        if (annotation is PdfRedactAnnotation)
        {
            PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
            redact.SetOverlayText(new PdfString("New Text"));
        }
    }
}
doc.Close();

Update: Findings as of 2021-01-07

As @mkl's answer points out, the PDF Redact Annotation Specification clarifies the underlying redact annotation DOM entries. OverlayText is just one part of the equation. If you use OverlayText then there must be a DA element defined (DA is a string that provides formatting info for the OverlayText). Finally, if RO is defined, it supersedes pretty much all of the other independent display entries.

My testing document was made using Acrobat DC Pro, by manually adding a redaction in Acrobat. Doing this results in a Redact annotation with all of the above entries set. Copies of my test documents can be found in this DropBox folder.

(Side note: In my original question, I mention hovering over the redaction's red rectangle in order to preview what the applied redaction will look like... After testing in multiple browsers and other PDF Viewers like Foxit Reader, it looks like the function to 'preview' what the redaction will look like when applied by hovering your mouse over the red outline is only supported in Acrobat products. All other viewers tested will only show the red border, with nothing occurring when you hover your cursor over it. The black rectangles shown above can only be viewed in other programs after redactions have been applied.

Additional testing has shown that the hover-over preview is maintained separately from the redaction details itself, with Acrobat operating to try to keep the hover-over details in-sync with the underlying annotation. It is best to ignore the hover-over preview when testing, and refer to the results after applying redactions.)

@mkl's recommendation to remove the RO entry in order to try to let the OverlayText take priority was a good idea, but it unfortunately didn't work. There was no notable difference from my original results.

After poking around in iText7's PdfRedactAnnotation, I found that the following methods all result in a reference to the Redact object's RO entry:

PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
redact.GetRolloverAppearanceObject();
redact.GetRedactionRolloverAppearance();
redact.GetPdfObject().Get(PdfName.RO);
redact.GetAppearanceDictionary().Get(PdfName.R);

(I confirmed they are in fact the exact same reference by checking the equality comparator. As reference types, they all returned true when tested using ==).

On further testing, I have concluded that the RO property must have a copy of the same OverlayText stored internally. If you have two redactions with different original values, you can "copy" the RO element from one redaction to another:

PdfObject ro = firstRedact.GetPdfObject().Get(PdfName.RO);
secondRedact.GetPdfObject().Put(PdfName.RO, ro);

If you do this and apply redactions, the "overlay text" from the first redact will have replaced the "overlay text" in the second. The other RO element values are also copied (such as BBox, which defines the black rectangle's dimensions)... but at least those elements can be adjusted.

The problem remains that the iText7 PdfObject of RO has 7 sub elements, and none of them or their descendant elements appear to expose the text that I'm trying to change.

My final test was whether I could copy RO elements from one PDF to another (so that I could use a second source PDF with an annotation with the desired RO "overlay text" already configured), but it looks like indirect objects don't like being .Put() into other documents.

So now, I'm left with trying to either find a way to access/alter the text stored away in RO, or to clone a preconfigured RO from another document.

Chronicide
  • 1,112
  • 1
  • 9
  • 32

2 Answers2

2

What does the specification say?

The OverlayText entry of redaction annotations is specified as

Key Type Value
OverlayText text string (Optional) A text string specifying the overlay text that should be drawn over the redacted region after the affected content has been removed. This entry is ignored if the RO entry is present.

(ISO 32000-2, Table 195 — Additional entries specific to a redaction annotation)

Maybe in your source PDF the redaction annotation has a RO taking precedence.

Furthermore, that table says this concerning the DA entry:

Key Type Value
DA byte string (Required if OverlayText is present, ignored otherwise) The appearance string that shall be used in formatting the overlay text when it is drawn after the affected content has been removed (see 12.7.4.3, "Variable text"). This entry is ignored if the RO entry is present.

If you use OverlayText, therefore, you also have to make sure the DA default appearance string is set. Did you?


The RO entry in the same table is specified as

Key Type Value
RO stream (Optional) A form XObject specifying the overlay appearance for this redaction annotation. After this redaction is applied and the affected content has been removed, the overlay appearance should be drawn such that its origin lines up with the lower-left corner of the annotation rectangle. This form XObject is not necessarily related to other annotation appearances, and may or may not be present in the AP dictionary. This entry takes precedence over the IC, OverlayText, DA, and Q entries.

So what to do now?

According to the details posted above, one obvious option to proceed is to create a redaction overlay XObject (RO) for the changed redaction annotations. You can do this by replacing your

if (annotation is PdfRedactAnnotation)
{
    PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
    redact.SetOverlayText(new PdfString("New Text"));
}

by

if (annotation is PdfRedactAnnotation)
{
    PdfRedactAnnotation redact = (PdfRedactAnnotation)annotation;
    redact.SetOverlayText(new PdfString("New Text"));
    Rectangle rectangle = redact.GetRectangle().ToRectangle();
    PdfStream stream = redact.GetRedactRolloverAppearance();
    if (stream != null)
    {
        rectangle = stream.GetAsArray(PdfName.BBox).ToRectangle();
    }
    PdfFormXObject redactionOverlay = new PdfFormXObject(rectangle);
    redactionOverlay.GetPdfObject().Put(PdfName.Matrix, new PdfArray(new double[] { 1, 0, 0, 1, -rectangle.GetX(), -rectangle.GetY() }));
    using (Canvas canvas = new Canvas(redactionOverlay, pdfDocument))
    {
        PdfCanvas pdfCanvas = canvas.GetPdfCanvas();
        pdfCanvas.SetFillColorGray(0);
        pdfCanvas.Rectangle(rectangle);
        pdfCanvas.Fill();
        pdfCanvas.SetFillColorGray(1);
        canvas.Add(new Paragraph("New Text"));
    }

    stream = redactionOverlay.GetPdfObject();
    redact.SetRolloverAppearance(stream);
    redact.SetDownAppearance(stream);
    redact.SetRedactRolloverAppearance(stream);
}

The result after redacting in Acrobat:

screenshot

By adapting the used fill colors and the paragraph style you can make the appearance correspond more closely to the Adobe Acrobat generated appearances (or you alternatively can generate a look completely of your own design).

Beware, I only have a fairly old Adobe Acrobat version available, v9.5, so probably current versions don't accept a redaction appearance as generated above or at least apply it differently.

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thank you for providing that reference information, it was very informative. While those specific properties were not able to address the issue, I was able to find the PDF documentation to be able to experiment further and find a workaround. I'll post an answer shortly that describes how I got it to work. – Chronicide Jan 06 '21 at 19:30
  • Update: It looks like a copy of the original OverlayText is being kept somewhere in the rollover entry in the AppearanceDictionary. Originally, I thought that removing the AppearanceDictionary resolved my problem. However, it led to a slew of glitches. So I think I need to update the ApearanceDictionary returned by .GetRolloverAppearance() to update the OverlayText there as well, but iText doesn't expose all of the redact appearance properties necessary to modify it for my needs. I may be out of luck... – Chronicide Jan 06 '21 at 21:14
  • Have you tried simply removing that *RolloverAppearance*? (Actually i assume that is a misnamed getter. It refers to the **RO** entry described in my answer which has nothing to do with *rollover* but which more likely is short for *redaction overlay*.) – mkl Jan 07 '21 at 07:23
  • I did try removing it. GetRolloverAppearanceObject(), GetRedactionRolloverAppearance(), GetPdfObject().Get(PdfName.RO) and GetAppearanceDictionary().Get(PdfName.R) all return the same element: either a PdfStream or PdfDicitonary of the redaction overlay. Remove(PdfName.RO) results in the same effect as described in my original question. GetPdfObject().GetAsDictionary(PdfName.AP).Remove(PdfName.R) results in no rollover effect at all. Remove(PdfName.AP) seemed to work when opened in Acrobat, but it was injecting a default AppearanceDictionary to fix things up. Nothing showed up in other viewers – Chronicide Jan 07 '21 at 14:41
  • Can you share your test PDF to allow reproducing the problem? That been said, the **OverlayText** and **RO** XObject don't need to show immediately; as you can read above, they shall show after the actual redaction process is executed, not immediately. You sound like you expect to see them immediately. – mkl Jan 07 '21 at 14:56
  • I don't know where I could host it, but it was just a PDF with a redaction added via Acrobat. When adding a redaction in Acrobat, it adds a rectangle with a red border and a transparent fill around the selected area. When you hover your cursor over the rectangle, it shows a 'preview' of what it will look like when redactions are applied (a black rectangle with red text, as seen in the images above). Whatever is visible when hovering over the red box to preview the redaction is what gets 'drawn' in place of the underlying content when redactions are applied. – Chronicide Jan 07 '21 at 15:32
  • The 'preview' seems to be the RO element. In my question above, the OverlayText in the preview wasn't updated until I manually right-clicked on the redaction rectangle, which triggered a refresh of the OverlayText in the RO element. When I applied the redactions without right-clicking first, the original redaction text would be used. If I right-clicked it to force an update and then applied the redactions, the updated text would be used. So while the RO element is what is drawn when redactions are applied (as described in the RO definition) I can also see it immediately when hovering over it – Chronicide Jan 07 '21 at 15:39
  • I don't have a current Adobe Acrobat version. *"I don't know where I could host it"* - You could create a public share on google drive or dropbox. – mkl Jan 07 '21 at 16:47
  • I updated my question with all of the information I've gathered while testing. I also have created a [DropBox Folder](https://www.dropbox.com/sh/oshuez8xi8dj92d/AAC73_fDvn5iKMvDQgESZiMha?dl=0) to host some samples. Thank you for the time you've committed to this so far. – Chronicide Jan 07 '21 at 20:01
  • 1
    Thank you, your code worked as is, and it helped me to better understand new ways for me to work with iText7. I appreciate all of the time and effort you put into helping me out! – Chronicide Jan 21 '21 at 19:33
0

I was able to change the redaction annotation overlay text and, upon redaction, have that text display correctly over the redacted block. I used the SyncFusion Essential PDF library that is included as a part of SyncFusion File Formats. (I am not affiliated with SyncFusion, though I do have a paid license to their File Formats libraries through my employer.) I tested with Adobe Acrobat Pro DC.

When I first attempted to replace the redaction overlay text, I ran into a similar issue with SyncFusion as the OP did with iText 7: the overlay would display as changed after running my code, but redaction would bring back the formerly replaced overlay text. As there was no way to change both the displayed text overlay and the overlay text accessible by the redaction process, I got around this issue by writing code that makes the desired changes, exports the PDF's annotations to a JSON file, deletes the PDF's annotations, and then imports the JSON file back into the PDF. This generates new annotations that have the same text value for both the text overlay and the redaction process (the redaction process overlay text, I believe, is generated as a result of the creation of the PDF annotation). This is the code using SyncFusion Essential PDF:

using System.Drawing;
using Syncfusion.Pdf.Graphics;
using Syncfusion.Pdf.Interactive;
using Syncfusion.Pdf.Parsing;
using Syncfusion.Pdf;

PdfLoadedDocument loadedDocument = new PdfLoadedDocument(@"C:\Users\Joe\Desktop\Redact\MarkedOriginal.pdf");
PdfLoadedPage page = loadedDocument.Pages[0] as PdfLoadedPage;
foreach (PdfLoadedRedactionAnnotation redactionAnnotation in loadedDocument.Pages[0].Annotations)
{
    PdfStandardFont font = new PdfStandardFont(PdfFontFamily.Helvetica, 10);
    redactionAnnotation.Font = font;
    redactionAnnotation.TextColor = Color.White;
    redactionAnnotation.BorderColor = Color.Black; //See note in SO answer about this
    redactionAnnotation.OverlayText = "New Text";
}

//Export, delete, and then import annotations to create a redaction annotation with the same preview and final redaction
loadedDocument.ExportAnnotations(@"C:\Users\Joe\Desktop\Redact\Output.json", AnnotationDataFormat.Json);

for (int i = 1; i <= loadedDocument.Pages[0].Annotations.Count; i++)
{
    loadedDocument.Pages[0].Annotations.RemoveAt(i);
}
loadedDocument.ImportAnnotations(@"C:\Users\Joe\Desktop\Redact\Output.json", AnnotationDataFormat.Json);
loadedDocument.Save();
loadedDocument.Close(true);

If OP needs the border of the redaction marking boxes to be a color other than black, some more code will need to be written. I found that when I used redactionAnnotation.BorderColor = Color.Black; the redaction marking box looked as expected. However, when I used Color.Red or other colors, the border retained the black color with the new color also bordering the first redaction and only black bordering the second redaction in the file supplied by the OP. With further research, I suspect this can be remediated via SyncFusion, iText 7, or possibly by editing the JSON file's annotation defaultappearance line prior to importing the file back into the PDF. This is the defaultappearance line generated when I ran my code:

"defaultappearance": "1 1 1 RG 0 g 0 Tc 0 Tw 100 Tz 0 TL 0 Ts 0 Tr /Helv 10 Tf"

It's worth pointing out that SyncFusion has free and paid tiers for licensing their software. The SyncFusion Community License is, per SyncFusion, free for "companies and individuals with less than $1 million USD in annual gross revenue and 5 or fewer developers." The SyncFusion File Formats Developer License would cover everyone else.

joeschwa
  • 3,083
  • 1
  • 21
  • 41