Using iTextSharp, how can I determine if a parsed chunk of text is both bolded and underlined?
Details:
I'm trying to parse .PDF files in C# specifically for text that is both bolded and underlined. Using ITextSharp, I can derive from LocationTextExtractionStrategy and get the text, the location, the font, etc. from the iTextSharp.text.pdf.parser.TextRenderInfo object passed to the overridden .RenderText method.
However, determining if the text is Bold and/Underlined from the TextRenderInfo object has not been straight forward.
- I tried to use TextRenderInfo.GetFont() to find the font properties, but was unsuccessful
- I can currently determine if the text is Bold or not, by accessing the private Graphics State field on the TextRenderInfo object and checking it's .Font.PostscriptFontName property for the word "Bold" (Ugly, but appears to work.)
- Biggest issue: I haven't found anything to determine if the text is underlined. How can I determine this?
Here is my current attempt:
private FieldInfo _gsField = typeof(TextRenderInfo).GetField("gs",
BindingFlags.GetField | BindingFlags.NonPublic | BindingFlags.Instance);
//Automatically called for each chunk of text in the PDF
public override void RenderText(TextRenderInfo renderInfo)
{
base.RenderText(renderInfo);
//UNDONE:Need to determine if text is underlined. How?
//NOTE: renderInfo.GetFont().FontWeight does not contain any actual information
var gs = (GraphicsState)_gsField.GetValue(renderInfo);
var textChunkInfo = new TextChunkInfo(renderInfo);
_allLocations.Add(textChunkInfo);
if (gs.Font.PostscriptFontName.Contains("Bold"))
//Add this to our found collection
FoundItems.Add(new TextChunkInfo(renderInfo));
if (!_lineHeights.Contains(textChunkInfo.LineHeight))
_lineHeights.Add(textChunkInfo.LineHeight);
}
Full source code of current attempt at: GitHub Repository (Two examples (example.pdf and example2.pdf) are included with text similar to what I'll be searching through.)