I already read all related StackOverflow and haven't find a decent solution to this. I want to open a PDF, get the text (words) and their coordinates then further, add a sticky note to some of them.
Seems to be mission impossible, I'm stucked.
How come this code will correctly find all words in a page (but not their coordinates)?
using (PdfReader reader = new PdfReader(path))
{
StringBuilder sb = new StringBuilder();
ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
for (int page = 5; page <= 5; page++)
{
string text = PdfTextExtractor.GetTextFromPage(reader, page, strategy);
Console.WriteLine(text);
}
//txt = sb.ToString();
}
But this one gets coordinates, but for "chunks" that cannot rely they are in proper order.
PdfReader reader = new PdfReader(path);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
LocationTextExtractionStrategyEx strategy;
for (int i = 5; i <= 5; i++) // reader.NumberOfPages
{
//strategy = parser.ProcessContent(i, new SimpleTextExtractionStrategy());
// new MyLocationTextExtractionStrategy("sample", System.Globalization.CompareOptions.None)
strategy = parser.ProcessContent(i, new LocationTextExtractionStrategyEx("MCU_MOSI", 0));
foreach (LocationTextExtractionStrategyEx.ExtendedTextChunk chunk in strategy.m_DocChunks)
{
if (chunk.m_text.Trim() == "MCU_MOSI")
Console.WriteLine("Bingo"); // <-- NEVER HIT
}
//Console.WriteLine(strategy.m_SearchResultsList.ToString()); // strategy.GetResultantText() +
}
This uses a class from this post (little modified by me) Getting Coordinates of string using ITextExtractionStrategy and LocationTextExtractionStrategy in Itextsharp
But only finds useless "chunks".
So the question is can with iTextSharp really locate words in page so I can add some sticky notes nearby? Thank you.