I have a little C# app that is extracting text from a Microsoft Publisher file via the COM Interop API.
This works fine, but I'm struggling if I have multiple styles in one section. Potentially every character in a word could have a different font, format, etc.
Do I really have to compare character after character? Or is there something that returns me the different style sections? Kinda like I can get the different Paragraphs?
foreach (Microsoft.Office.Interop.Publisher.Shape shp in pg.Shapes)
{
if (shp.HasTextFrame == MsoTriState.msoTrue)
{
text.Append(shp.TextFrame.TextRange.Text);
for(int i = 0; i< shp.TextFrame.TextRange.WordsCount; i++)
{
TextRange range = shp.TextFrame.TextRange.Words(i+1, 1);
string test = range.Text;
}
}
}
Or is there in general a better way to extract the text from a Publisher file? But I have to be able to actually write it back with the same formatting. It's for a translation.