-1

My goal is to draw rectangle over searched text.

I already implemented LocationTextExtractionStrategy class, which is connecting text chunks into sentences (one per each line), and it returns starting location- X and Y.

I was using solution from: Getting Coordinates of string using ITextExtractionStrategy and LocationTextExtractionStrategy in Itextsharp , and here is what i got so far (below is the code for organizing chunks)

  public override void RenderText(TextRenderInfo renderInfo)
    {
        LineSegment segment = renderInfo.GetBaseline();
        if (renderInfo.GetRise() != 0)
        { // remove the rise from the baseline - we do this because the text from a super/subscript render operations should probably be considered as part of the baseline of the text the super/sub is relative to 
            Matrix riseOffsetTransform = new Matrix(0, -renderInfo.GetRise());
            segment = segment.TransformBy(riseOffsetTransform);
        }
        TextChunk tc = new TextChunk(renderInfo.GetText(), tclStrat.CreateLocation(renderInfo, segment));
        locationalResult.Add(tc);
    }

  public IList<TextLocation> GetLocations()
    {

        var filteredTextChunks = filterTextChunks(locationalResult, null);
        filteredTextChunks.Sort();

        TextChunk lastChunk = null;

        var textLocations = new List<TextLocation>();

        foreach (var chunk in filteredTextChunks)
        {

            if (lastChunk == null)
            {
                //initial
                textLocations.Add(new TextLocation
                {
                    Text = chunk.Text,
                     X = chunk.Location.StartLocation[0],
                     Y = chunk.Location.StartLocation[1]
                });

            }
            else
            {
                if (chunk.SameLine(lastChunk))
                {
                    var text = "";
                    // we only insert a blank space if the trailing character of the previous string wasn't a space, and the leading character of the current string isn't a space
                    if (IsChunkAtWordBoundary(chunk, lastChunk) && !StartsWithSpace(chunk.Text) && !EndsWithSpace(lastChunk.Text))
                        text += ' ';

                    text += chunk.Text;

                    textLocations[textLocations.Count - 1].Text += text;

                }
                else
                {

                    textLocations.Add(new TextLocation
                    {
                        Text = chunk.Text,

                        X = chunk.Location.StartLocation[0],
                        Y = chunk.Location.StartLocation[1]
                    });
                }
            }
            lastChunk = chunk;
        }

        //now find the location(s) with the given texts
        return textLocations;

    }

When i try to draw a rectangle in cords of text, it isnt even close to it. Im drawing rectangle like that:

PdfContentByte content = pdfStamper.GetOverContent(pageNumber);
iTextSharp.text.Rectangle rectangle = new iTextSharp.text.Rectangle(leftLowerX, leftLowerY, upperRightX, upperRightY);//pdfReader.GetPageSizeWithRotation(x);
rectangle.BackgroundColor = color;
content.Rectangle(rectangle);
mkl
  • 90,588
  • 15
  • 125
  • 265
Bartosz Olchowik
  • 1,129
  • 8
  • 22
  • Please share an example PDF you experience the issue with. – mkl Jul 06 '18 at 14:24
  • [PDF Example](https://drive.google.com/open?id=1DU2fbncr9JvuwrFr1mFIskN8oXYydaMg) Lets look at page 21. – Bartosz Olchowik Jul 08 '18 at 19:12
  • Please set `pdfStamper.RotateContents = false` after instantiating the stamper. Your sample PDF has rotated pages. In this case iText tries to help you by using a different coordinate system when drawing. As the text extraction coordinate system remains unchanged, though, using extracted coordinates to draw something fails for rotated pages. The above setting disables this setting. – mkl Jul 09 '18 at 13:47
  • Your knowledge is awesome, it works. Thank you for simple and good solution! – Bartosz Olchowik Jul 09 '18 at 18:12
  • I'll make that an actual answer you can accept. – mkl Jul 10 '18 at 04:45
  • can i accept your comment in some way, or you have to post an 'Answer' ? – Bartosz Olchowik Jul 10 '18 at 17:03

2 Answers2

1

If you were to use iText7 and pdfSweep it literally has a function that does this.

RegexBasedCleanupStrategy st = new RegexBasedCleanupStrategy("the_word_to_highlight");

PdfAutoSweep sweep = new PdfAutoSweep(st);

PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputfile)); 
sweep.highlight(pdfDocument);
pdfDocument.close();

That will highlight the words you're looking for. Of course you can do much more, with some minor configuration.

Joris Schellekens
  • 8,483
  • 2
  • 23
  • 54
  • What about license, can i use itext7 for educational purposes ? – Bartosz Olchowik Jul 09 '18 at 10:46
  • thank you, so can i, for example: remove highlighted text ? – Bartosz Olchowik Jul 09 '18 at 11:58
  • That's a different question. Please create a different SO question for that. And incentivize people to answer your questions by upvoting and accepting answers. – Joris Schellekens Jul 09 '18 at 12:00
  • It says i need PdfWriter, and when i initialize pdf writer and create PdfDocument with it, its totally empty. When i try to create PdfDocument with PdfReader and PdfWriter, it says document is in use (firstly created PdfWriter locks file for PdfReader), and i cant highlight any text without PdfWriter. Have you got solution for that ? – Bartosz Olchowik Jul 09 '18 at 19:05
  • @BartoszOlchowik Please throw away your code, and start anew from an existing example from the web site. It seems that you are mixing different things that you shouldn't mix. Take a step back, clear your head, start anew. No one can help you with your question, because no one can reproduce the question you post. – Bruno Lowagie Jul 10 '18 at 10:14
0

Please set

pdfStamper.RotateContents = false;

after instantiating the stamper.

Your sample PDF has rotated pages. In this case iText 5.x by default tries to assist you by interpreting coordinates you give in drawing instructions in a different, rotated coordinate system. As the text extraction coordinate system remains unchanged, though, using extracted coordinates to draw something fails for rotated pages. The above setting disables this assistance.

mkl
  • 90,588
  • 15
  • 125
  • 265