4

This is related to a previous question I asked about reading an annotation's appearance stream and writing its text to the Contents. I'd like to do a similar action with a Line annotation, reading its appearance width and setting the actual width to match the appearance.

I'm having trouble figuring out how to adapt my "set text contents using appearance" function to set line width. This is the code I'm currently using for getting the text:

//main function for setting inner content to appearance value
public void changeAnnotationContentToAppearance(PdfDictionary dict)
        {

                string surface = pdfTextParser.retrieveText(dict);
                if (surface != null)
                {
                   //update CONTENTS with appearance
                   //for changing line width, I would instead modify the /BS dictionary's /W key value, i think
                   dict.Put(PdfName.CONTENTS, new PdfString(surface));
                }
        }

//get text from /AP dictionary
public string retrieveText(PdfDictionary annotDictionary)
    {

        PdfDictionary appearancesDictionary = annotDictionary.GetAsDict(PdfName.AP);
        foreach (PdfName key in appearancesDictionary.Keys)
        {
            PdfStream value = appearancesDictionary.GetAsStream(key);
            if (value != null)
            {
                String text = ExtractAnnotationText(value);
                return text;
            }
        }
        return null;

    }

//read the appearance stream and extract text contents
public String ExtractAnnotationText(PdfStream xObject)
{
    PdfDictionary resources = xObject.GetAsDict(PdfName.RESOURCES);
    ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();

    PdfContentStreamProcessor processor = new PdfContentStreamProcessor(strategy);
    processor.ProcessContent(ContentByteUtils.GetContentBytesFromContentObject(xObject), resources);
    return strategy.GetResultantText();
}

ExtractAnnotationText() only seems capable of reading text, not line width, because ITextExtractionStrategy() doesn't have any methods for returning line properties. Does iTextSharp offer another extraction strategy for use in working with lines?

If I'm reading right, this question, this one, and this one suggest that I would need to implement a class, but I'm not sure which one I should subclass for getting line data, or how exactly I would go about doing that.

EDIT: I'd also like to get the appearance data for the points defining a rectangle in a textbox. Though this could be a different question, it seems closely related to this problem: retrieving non-text graphical data defining an annotation's appearance stream.

Community
  • 1
  • 1
sigil
  • 9,370
  • 40
  • 119
  • 199

2 Answers2

2

You need the PathRenderInfo object to get information about lines and shapes. The PathRenderInfo object was introduced in iText 7. This is a proof of concept I wrote very quickly:

public static void main(String args[]) throws IOException {
    PdfDocument document = new PdfDocument(new PdfReader(SRC));
    PdfPage page = document.getPage(1);
    PdfCanvasProcessor processor = new PdfCanvasProcessor(new IEventListener() {
        public void eventOccurred(IEventData data, EventType type) {
            if (type == EventType.RENDER_PATH) {
                PathRenderInfo renderinfo = (PathRenderInfo) data;
                int i = renderinfo.getOperation();
                switch (i) {
                    case 1:
                        System.out.print("Stroke: ");
                        break;
                    case 2:
                        System.out.print("Fill: ");
                        break;
                    default:
                        System.out.print("No: ");
                }
                for (Subpath p : renderinfo.getPath().getSubpaths()) {
                    for (IShape shape : p.getSegments()) {
                        for (Point point : shape.getBasePoints()) {
                            System.out.println(String.format("x = %s; y = %s", point.getX(), point.getY()));
                        }
                    }
                }
            }
        }
        public Set<EventType> getSupportedEvents() {
            return null;
        }
    });
    processor.processPageContent(page);
}

I ran it on a PDF with plenty of lines and this was (part of) the output:

Stroke: x = -406.0; y = -240.0
x = 406.0; y = -240.0
x = -406.0; y = -200.0
x = 406.0; y = -200.0
x = -406.0; y = -160.0
x = 406.0; y = -160.0
x = -406.0; y = -120.0
x = 406.0; y = -120.0
x = -406.0; y = -80.0
x = 406.0; y = -80.0
x = -406.0; y = -40.0
x = 406.0; y = -40.0
x = -406.0; y = 0.0
x = 406.0; y = 0.0
x = -406.0; y = 40.0
x = 406.0; y = 40.0
x = -406.0; y = 80.0

You'll have to upgrade to iText 7 to make this work and you'll also have to explore which information is contained in the PathRenderInfo, Subpath and IShape objects.

Update:

As indicated in the comments, one could wonder if you're asking the right question. Take a look at this screen shot:

enter image description here

If you look inside this PDF, you won't find an appearance stream:

enter image description here

The appearance is created by the viewer based on values such as:

  • /C: the color: red = 0, green = 0, blue = 1 (hence the line is blue)
  • /LE: the line endings (in this case: diamond shape)
  • L: the line between (x = 20, y = 790) and (x = 575, y = 790)
  • ...

Why would you parse the appearance if you have all the necessary information at hand in the annotation dictionary?

The same goes for the rectangle of a text annotation. That information is stored in the /Rect value. In this case, the annotation is dimensionless ([0 0 0 0]) because we have only one dimension and the line is defined by the values stored in /L.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • Is `PathRenderInfo` available under a different name in iTextSharp (my project is in C#)? Also, do you know if this approach works for Line annotations and annotation borders? Or just Shapes? – sigil May 26 '16 at 15:46
  • 1
    iText 7 for C# will be released in June. The functionality works for content streams (e.g. a page content stream, the appearance stream of an annotation,...). Note that annotations don't require an appearance stream. If you look at an annotation dictionary, you'll find border dictionaries, coordinates of line annotations, etc. In your question however, you explicitly asked for a way to parse the appearance stream. – Bruno Lowagie May 27 '16 at 04:35
  • 1
    I've updated my answer in case your question was wrong. – Bruno Lowagie May 27 '16 at 07:13
  • I might be asking my question incorrectly, if so, I apologize. I am able to access and change values such as `/Rect` and `/C` in the annotation dictionary. But when the appearance stream does not match these values (e.g. the border is black in the dictionary but red in the appearance stream), then when any change is made to the annotation, its appearance gets overwritten by the dictionary (the border changes from red to black). So I want to set the dictionary contents to the appearance stream. This problem occurs in some PDFs that are created in Foxit and then opened in Adobe. – sigil May 27 '16 at 19:03
  • If it wasn't clear from the above comment, I want to parse the appearance stream and use its values to set values in the annotation dictionary. – sigil May 27 '16 at 21:25
  • OK, now I understand: *if* there is an appearance stream, you want to adapt the annotation dictionary so that it matches the appearance. You'll need parsing capabilities that probably aren't present in iTextSharp 5 and that may or may not be present in iTextSharp 7 (I don't have sufficient experience with iText 7 yet to know). Wouldn't it be easier to remove the appearance stream so that the dictionary settings prevail? – Bruno Lowagie May 28 '16 at 09:40
  • No, we don't want to remove the appearance stream, because a requirement is that for any conflict between appearance and dictionary, the appearance data is trusted more. [This answer](http://stackoverflow.com/a/37022267/619177) describes how to read the appearance text, and that solution worked for me; is this not a similar problem? – sigil May 31 '16 at 15:59
  • The answer you refer to is about parsing for text. Your current question is about parsing for path-constructing and path-painting operators. [The answer by mkl](http://stackoverflow.com/a/37539246/1622493) explains why that's so difficult. Maybe you should hire a specialist to achieve your requirement. I don't know of any software that can do what you want out of the box. As @mkl explains, you might write something custom that assumes that the files are created by Foxit in a very specific way. – Bruno Lowagie May 31 '16 at 16:42
  • How do I read the appearance stream? I asked @mkl the same question in a comment on the answer you reference, but thought I'd check with you to see if you know as well and might be able to respond sooner. – sigil May 31 '16 at 16:49
  • **Reading** the appearance stream is easy: it's just a matter of getting the value that corresponds with the `/AP` key in the annotation dictionary. **Parsing** the appearance stream is the difficulty. I've explained it in great detail in my answer, but I can't help it if you don't (want to) understand that answer. – Bruno Lowagie May 31 '16 at 17:04
  • I've found an example where I get the normal appearance (`/N`) of the appearance ('/AP') from the widget annotation of a signature field: [GetN2fromSig](http://developers.itextpdf.com/examples/security/inspect-digital-signatures#1346-getn2fromsig.java) I've put that example on the official iText site a long time ago. It was very easy to find it using the search field in the upper-right corner of the site. – Bruno Lowagie May 31 '16 at 17:08
  • I tried following and adapting that example, but it didn't work for a FreeText annotation. When I get the `/N` appearance using `GetAsStream()`, I can't find the raw data in any property of the stream. So I get the `/Resources` from the stream using `GetAsDict()`, but the Resources dictionary only has `/Font` dictionary and `/ProcSet` PdfArray. ProcSet has `/Text` and `/PDF`, which both return `null` no matter how I retrieve their values. I think I can write my own parser for this use case, if I can just figure out how to get to the raw text that @mkl apparently extracted using RUPS. – sigil Jun 01 '16 at 20:06
  • And when I try to write the value of the `/N` stream using `WriteContent()` to a FileStream, I get a blank file. – sigil Jun 01 '16 at 23:12
  • Have you looked at the streams using iText RUPS? Also: why are you babbling about `/ProcSet`? `ProcSet` was deprecated in PDF a long time ago. Viewers ignore it. – Bruno Lowagie Jun 02 '16 at 04:47
  • I examined the stream using RUPS, and found the data I'm looking for in the `/N` stream. I was able to figure out how to adapt the `GetN2fromSig` example to get the stream as a string, so now I need to work on parsing it. Thank you so much for all your help and patience. – sigil Jun 02 '16 at 23:44
2

The OP clarified in comments to @Bruno's answer

I want to parse the appearance stream and use its values to set values in the annotation dictionary.

and

when the appearance stream does not match these values (e.g. the border is black in the dictionary but red in the appearance stream), ... I want to set the dictionary contents to the appearance stream. This problem occurs in some PDFs that are created in Foxit and then opened in Adobe.

Unfortunately PDF allows many ways to create similar effects. To draw a border, e.g.,

  • you can strike a path of four lines,
  • or you can strike a rectangle,
  • or you can fill a rectangle and fill another, slightly smaller inner rectangle with white,
  • or you can fill everything with a rectangular mask,
  • or you can draw a bitmap with the desired form,
  • or you can ...
  • ...

Thus, a truly generic solution for your problem is somewhere between extremely complicated and impossible.

Furthermore some appearances may be impossible to represent using only the limited abstract settings in the annotation dictionary. E.g. a border in the appearance stream might be created solid in the middle but fading out left and right using transparency, or it may be drawn using a color shading operator resulting in a color gradient, or its form might not be exactly rectangular but instead irregular, or, or, ...


If you are not looking for a generic solution, though, but merely for a solution working for annotations created by certain software products like Foxit in some versions, and if the appearances created by that software can be represented using the abstract annotation dictionary values, the task becomes feasible.

In that case you should start by analyzing sample appearance streams created by those software products. Most likely some pattern will emerge.

As soon as you have found that pattern, you can start implementing a matching iTextSharp 5.5.x IExtRenderListener or iTextSharp 7.0.x IEventListener.

mkl
  • 90,588
  • 15
  • 125
  • 265
  • In [this answer](http://stackoverflow.com/a/37022267/619177) you gave a sample of appearance stream data. How did you read that? I haven't been able to find where it is in the `PdfStream` object. – sigil May 31 '16 at 16:02
  • 1
    See [GetN2fromSig](http://developers.itextpdf.com/examples/security/inspect-digital-signatures#1346-getn2fromsig.java) – Bruno Lowagie May 31 '16 at 17:10
  • 1
    @sigil Bruno's reference shows how one can read the contents of a stream in one's own code. To retrieve the stream in the answer you point to, though, I used [RUPS](http://itextpdf.com/Products/itext-rups), a tool based on iText. – mkl Jun 01 '16 at 06:32