2

I'm trying to extract all the text in each slide of a powerpoint file. For some reason I'm only getting some text and not all of them. I'm looping through all shapes in the slide and checking for both textframes and tables. But some slides with text will print out nothing.

Here's a sceenshot of the slide that only printed the title and no other text. enter image description here

Code

foreach (PowerPoint.Slide _slide in pptPresentation.Slides) {
    foreach(PowerPoint.Shape _shape in _slide.Shapes) {
        //check for textframes
        if (_shape.HasTextFrame == MsoTriState.msoTrue) {
            var textFrame = _shape.TextFrame;

            if (textFrame.HasText == MsoTriState.msoTrue) {
                var textRange = textFrame.TextRange;
                PrintAllParagraphs(textRange);
            } 
        }

        //check for tables
        if(_shape.HasTable == MsoTriState.msoTrue) {
            var slideTable = _shape.Table;
            int rowCount = slideTable.Rows.Count;
            int colCount = slideTable.Columns.Count;

            for(int y = 1; y <= rowCount; y++) {
                for(int x = 1; x <= colCount; x++) {
                    var tRange = slideTable.Cell(y, x).Shape.TextFrame.TextRange;
                    PrintAllParagraphs(tRange);
                }
            }
        }
    } //loop shapes
} //loop slides

print function

public void PrintAllParagraphs(PowerPoint.TextRange textRange) {
    for (int i = 1; i <= textRange.Paragraphs().Count; i++) {
        PowerPoint.BulletFormat bulletFormat = textRange.Paragraphs(i).ParagraphFormat.Bullet;
        Console.WriteLine( (bulletFormat.Type == PowerPoint.PpBulletType.ppBulletNone) ? textRange.Paragraphs(i).Text.ToString() : "* " + textRange.Paragraphs(i).Text.ToString());
    }
}

Are there other things i should be checking within the shape of a slide? Any help would be appreciated. Thanks.

EylM
  • 5,967
  • 2
  • 16
  • 28
Eric
  • 954
  • 2
  • 14
  • 23
  • Is there anything identifiable different about the slides where it fails to pull the text, are the shapes set up differently, is there overflow etc. – JoeTomks Jul 15 '19 at 16:32
  • Yeah actually, i just figured out it's a SmartArt that contains texts inside. i'm able to extract some text now but i'm not able to get all of it. it's giving me the last row of each textframe. so in the screenshot example above i would get "text" and "Text". I'm definitely closer to getting what i want compare to an hour ago. – Eric Jul 15 '19 at 16:36
  • 2
    i figured it out @Digitalsa1nt, going to post answer below. – Eric Jul 15 '19 at 16:47

1 Answers1

2

Okay, turns out that this is a SmartArt that's the reason why checking Shapes/Tables did not detect it.

All i had to do was to loop the nodes within the Smart Art and grab the text from TextRange. I noticed the text is seperated by "\r" so by splitting it i was able to get the correct output from it.

//check for SmartArt
if(_shape.HasSmartArt == MsoTriState.msoTrue) {
    foreach( SmartArtNode node in _shape.SmartArt.AllNodes) {
        var txtRange = node.TextFrame2.TextRange;
        var txt = txtRange.Paragraphs.Text.Split(new string[] { "\r" }, StringSplitOptions.None);

        foreach(string line in txt) 
            Console.WriteLine(line);
    }
}
Eric
  • 954
  • 2
  • 14
  • 23