How to get a coordinator (x, y) from specific letter in PDF using iTextSharp?

Question

I'm working with PDF and using iTexhSharp. So far, I could get data from a specific area already. But, I would like to make more flexible by make a it find the coordinator of the first letter(or number) of desired word and then from that coordinator make rectangle to crop around desired word. It would be good if anyone can give me a short example. Thank you.

The API of iText 7 is completely redesigned. You can use the ideas of the iText 7 code but the implementation looks decidedly different in iText 5.5.x. — mkl, Nov 15 '17 at 07:51
@mkl There is no HorizontalTextExtractionStrategy in this iTextSharp 5.5.10 ? Since, I couldn't use it. I'm facing issue about text alignment also. — tumsd923, Nov 16 '17 at 03:32
The `HorizontalTextExtractionStrategy` originally presented in [this answer](https://stackoverflow.com/a/33697745/1729265) for iText(Sharp) up to version 5.5.8 has therein already being ported to versions 5.5.9 and up for Java as `HorizontalTextExtractionStrategy2`. It should not be too difficult to do the same for the .Net version. If you indeed mean that strategy, I can look into that port. — mkl, Nov 16 '17 at 08:42
@mkl I looked from your answer in [this topic](https://stackoverflow.com/questions/35344982/itext-extracted-text-from-pdf-file-using-locationtextextractionstrategy-is-in-w). Are there anyway to use it C# ? Thanks. — tumsd923, Nov 17 '17 at 01:16
*"Are there anyway to use it C# ?"* - as I already said in my previous comment: It should not be too difficult to do the same for the .Net version. If you indeed mean that strategy, I can look into that port. — mkl, Nov 17 '17 at 21:57

score 1 · Answer 1 · answered Nov 14 '17 at 13:38

The basic idea here is to use IEventListener to get notified of TextRenderInfo events. Then split these into CharacterRenderInfo, and then ask for the bounding box of each of those.

class CharacterRenderInfoGetter implements IEventListener {

    private List<CharacterRenderInfo> characterRenderInfoList = new ArrayList<>();

    @Override
    public void eventOccurred(IEventData iEventData, EventType eventType) {
        if(eventType == EventType.RENDER_TEXT)
        {
            TextRenderInfo tri = (TextRenderInfo) iEventData;
            for(TextRenderInfo subTri : tri.getCharacterRenderInfos())
            {
                characterRenderInfoList.add(new CharacterRenderInfo(subTri));
            }
        }
    }

    public List<CharacterRenderInfo> getCharacterRenderInfoList()
    {
        java.util.Collections.sort(characterRenderInfoList);
        return characterRenderInfoList;
    }

    @Override
    public Set<EventType> getSupportedEvents() {
        return null;
    }
}

You can then use this class like so:

   File inputFile = getInputFiles()[0]; // provide your own implementation of course

    // create an iText PdfDocument out of the File
    PdfDocument pdfDocument = new PdfDocument(new PdfReader(inputFile));

    // construct the IEventListener that will measure character distances
    CharacterRenderInfoGetter characterRenderInfoGetter = new CharacterRenderInfoGetter();
    PdfCanvasProcessor processor = new PdfCanvasProcessor(characterRenderInfoGetter);

    /* Here we explicitly tell the IEventListener to process page 1 (the first page of the document
     * you can loop over all pages if you want to repeat this
     */
    processor.processPageContent(pdfDocument.getPage(1));

I know this code is written in Java. But the .NET equivalent should be very similar. At the very least it's good pseudo-code.

How to get a coordinator (x, y) from specific letter in PDF using iTextSharp?

1 Answers1