As answered in the comments, one could use iText to perform such a task. Maybe there are some better solutions, however, I doubt it. The cause of the mentioned issue, i.e. "[itextsharp] sometimes give co-ords of the start of the sentence the search text is in", is that sometimes glyphs are so close, that their boxes overlap, hence I don't see how it could be handled as you want.
So you can do the following:
extend LocationTextExtractionStrategy
class and override eventOccurred
, for example, as follows:
@Override
public void eventOccurred(IEventData data, EventType type) {
if (type.equals(EventType.RENDER_TEXT)) {
TextRenderInfo renderInfo = (TextRenderInfo) data;
// Obtain all the necesary information from renderInfo, for example
LineSegment segment = renderInfo.getBaseline();
// ...
}
pass an instance of such an extended class to PdfTextExtractor.getTextFromPage
as follows:
PdfTextExtractor.getTextFromPage(pdfDocument.getPage(1), new ExtendedLocationTextExtractionStrategy()
once text is found, the event will be triggered.
There are some difficulties in such a solution, of course, because the text you want to find and write above could be present in the PDF not as "Text", but "T", "ex", t", or even "t", "x", "e", "T". However, since you use iText, you may want to harness the advantages of one of its products - pdfSweep
. This product aims to completely remove unnecessary content from the PDF, with such a content being passed either as some locations (which you want to obtain, so that is not an option) or regexes.
This is how to create such a regex strategy (to find all "Dolor" and "dolor" instances in the document, completely remove them (from all the streams, so that they are either not observed from a PDF viewer nor found in the underlying PDF objects):
RegexBasedCleanupStrategy strategy = new RegexBasedCleanupStrategy("(D|d)olor").setRedactionColor(ColorConstants.GREEN);
This is how to use it:
PdfAutoSweep autoSweep = new PdfAutoSweep(strategy);
autoSweep.cleanUp(pdf); // a PdfDocument instance
And this is how to write some text on the location, at which the unnecessary text was present:
for (IPdfTextLocation location : strategy.getResultantLocations()) {
Rectangle rect = location.getRectangle();
// do something, for exapmle, write some text
}