5

I have a little C# app that is extracting text from a Microsoft Publisher file via the COM Interop API. This works fine, but I'm struggling if I have multiple styles in one section. Potentially every character in a word could have a different font, format, etc.
Do I really have to compare character after character? Or is there something that returns me the different style sections? Kinda like I can get the different Paragraphs?

foreach (Microsoft.Office.Interop.Publisher.Shape shp in pg.Shapes)
{
    if (shp.HasTextFrame == MsoTriState.msoTrue)
    {
        text.Append(shp.TextFrame.TextRange.Text);

        for(int i = 0; i< shp.TextFrame.TextRange.WordsCount; i++)
        {
            TextRange range = shp.TextFrame.TextRange.Words(i+1, 1);
            string test = range.Text;
        }
    }
}

Or is there in general a better way to extract the text from a Publisher file? But I have to be able to actually write it back with the same formatting. It's for a translation.

Remy
  • 12,555
  • 14
  • 64
  • 104

2 Answers2

0

You could consider using the clipboard to copy text sections as RTF which you can later paste back as RTF as with the example below for Word. I am not familiar with Publisher's object model.

string text = wordDocument.Content.Paragraphs[0]; System.Windows.Forms.Clipboard.SetText(text, TextDataFormat.Rtf);

Other than that, I have not found a collection of applied styles when using interop with any of the office products.

Raheel Khan
  • 14,205
  • 13
  • 80
  • 168
  • Thanks for the input. But with the RTF conversion I might loose some formatting options, which I would like to avoid. Currently I just compare every character against the next one... – Remy Apr 08 '12 at 14:52
0

We tried an approach were we just compared for every character as many font styles as possible. Not pretty, but works in most cases...

Remy
  • 12,555
  • 14
  • 64
  • 104