Powerpoint OpenXML whitespace is disappearing

Question

I'm coming across a problem where whitespace is being removed in powerpoint documents as soon as I reference a slide. The following code sample illustrates what I mean-

//Open the document.
using(PresentationDocument presentationDocument = PresentationDocument.Open(pptxFileName, true))
{
 //Just making this reference modifies the whitespace in the slide.
 Slide slide = presentationDocument.PresentationPart.SlideParts.First().Slide;
}

To reproduce this issue, create a presentation with a single slide, containing a single text box with the text "[ ]" (no quotes) in it. Now, set the font of the space between the square brackets to a different color than the rest of the text. This will result in a Run containing only whitespace characters. Once the code above is run against this presentation, the line that references the slide will cause the whitespace in the Run to disappear, ultimately leaving a us with a visually changed presentation than we originally started with, even though we never explicitly changed anything- the text will now be "[]" when opened in the powerpoint application.

In Word, the xml:space attribute can be set to 'preserve' on text elements to preserve whitespace, but it appears that there is no equivalent for Powerpoint.

This is a critical problem in situations where whitespace is used as a key component of slide design. Has anybody figured out a workaround for this issue?

There has GOT to be something else going on...your code above doesn't even perform any IO on the file. Even if it did, all it should be doing is creating a reference to the first slide....I recommend taking a look at the raw XML either by extracting it from the pptx manually or using PackageExplorer (http://packageexplorer.codeplex.com/). — Chris B. Behrens, Aug 22 '11 at 20:54
I agree with you 100%, this makes no sense and I must be doing something wrong, I just can't figure out what. I have been looking at the raw XML, and the text tag actually changes from ` ` to `` after running the code I posted above. I have even tried adding in the xml:space="preserve" attribute at various places in the file (run, paragraph, text, slide, presentation) to no avail. — ptrc, Aug 23 '11 at 13:18
I'm going to be digging into a PresentationML project tonight...it shouldn't take me long to reproduce this scenario. I'll let you know if I can figure anything out. — Chris B. Behrens, Aug 23 '11 at 15:21
Great, I'm looking forward to hearing back from you. It should be pretty easy to reproduce. — ptrc, Aug 23 '11 at 16:12
Okay, here's what I found - before I even got to the code stuff, I recreated your presentation and viewed it in PackageExplorer. When the color is the same, i.e., the run properties are consistent across the three characters " ", the whitespace is represented. When I changed the color of the space as you indicated, it broke it into three runs, like we would expect. What was not expected is that it does NOT represent the whitespace. — Chris B. Behrens, Aug 24 '11 at 00:23
All I can figure is that there is some subtle quirk about representing whitespace that we're missing here. — Chris B. Behrens, Aug 24 '11 at 00:25
Yeah, if you use a diff program to compare the slide1.xml file before and after, you can see there are several other changes made as well. After doing nothing but opening the file. This is a very odd issue, it almost feels like something was left out. — ptrc, Aug 24 '11 at 13:34
I would try to contact Eric White (http://blogs.msdn.com/b/ericwhite/) directly...he seems to be the Internet expert on these things. — Chris B. Behrens, Aug 24 '11 at 14:15

score 7 · Accepted Answer · answered Aug 25 '11 at 06:13

Yes, you have found a bug in the SDK.

@Chris, first of all, that code is, per the semantics of the Open XML SDK, modifying the file. When you access the contents of the part, and then go out of scope of the using statement, the contents of the part are written back into the package. This is because the presentation was opened for read/write (the second argument of the call to the Open method).

The problem is that when the contents of the part are read from the package, the space is being stripped off.

        //Open the document. 
    using (PresentationDocument presentationDocument = PresentationDocument.Open("test.pptx", true))
    {
        //Just making this reference modifies the whitespace in the slide. 
        Slide slide = presentationDocument.PresentationPart.SlideParts.First().Slide;
        var sh = slide.CommonSlideData.ShapeTree.Elements<DocumentFormat.OpenXml.Presentation.Shape>().First();
        Run r = sh.TextBody.Elements<Paragraph>().First().Elements<Run>().Skip(1).FirstOrDefault();
        Console.WriteLine(">{0}<", r.Text.Text);
        //r.Text.Text = " ";
    }

If you run the above code on the presentation, you can see that by the time you access that text element, the text of the text element is already incorrect.

If you uncomment the line that sets the text, interestingly, the slide does contain the space.

This is obviously a bug. I have reported it to the program manager at Microsoft who is responsible for the Open XML SDK.

As this scenario is important to you, I recommend that you use LINQ to XML for your code. The following code works fine:

    using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Presentation;
using DocumentFormat.OpenXml.Drawing;

public static class PtOpenXmlExtensions
{
    public static XDocument GetXDocument(this OpenXmlPart part)
    {

        XDocument partXDocument = part.Annotation<XDocument>();
        if (partXDocument != null)
            return partXDocument;
        using (Stream partStream = part.GetStream())
        using (XmlReader partXmlReader = XmlReader.Create(partStream))
            partXDocument = XDocument.Load(partXmlReader);
        part.AddAnnotation(partXDocument);
        return partXDocument;
    }

    public static void PutXDocument(this OpenXmlPart part)
    {
        XDocument partXDocument = part.GetXDocument();
        if (partXDocument != null)
        {
            using (Stream partStream = part.GetStream(FileMode.Create, FileAccess.Write))
            using (XmlWriter partXmlWriter = XmlWriter.Create(partStream))
                partXDocument.Save(partXmlWriter);
        }
    }
}

class Program
{
    static void Main(string[] args)
    {
        using (PresentationDocument presentationDocument = PresentationDocument.Open("test.pptx", true))
        {
            XDocument slideXDoc = presentationDocument.PresentationPart.SlideParts.First().GetXDocument();
            XNamespace p = "http://schemas.openxmlformats.org/presentationml/2006/main";
            XNamespace a = "http://schemas.openxmlformats.org/drawingml/2006/main";
            XElement sh = slideXDoc.Root.Element(p + "cSld").Element(p + "spTree").Elements(p + "sp").First();
            XElement r = sh.Element(p + "txBody").Elements(a + "p").Elements(a + "r").Skip(1).FirstOrDefault();
            Console.WriteLine(">{0}<", r.Element(a + "t").Value);
        } 
    }
}

You could, in theory, write some generic code to dig through the LINQ to XML tree, find all elements that contain only significant white space, then traverse the Open XML SDK element tree, and set the text of those elements. That is a bit of a mess, but once done, you could use the strongly typed OM of the Open XML SDK 2.0. The values of such elements would then be correct.

One technique that makes it more easy to use LINQ to XML with Open XML is to preatomize XName objects. See http://blogs.msdn.com/b/ericwhite/archive/2008/12/15/a-more-robust-approach-for-handling-xname-objects-in-linq-to-xml.aspx

-Eric

Thanks, Eric, happy to know I'm not just doing it horribly wrong. If anybody else in this situation may be interested, I ended up going with the the LINQ to XML only option. Although the strongly typed object model the Open XML SDK provides is nice, I'm not prepared to try and anticipate, and subsequently undo, all of the effects this bug may have on presentations created by 'deck designers' far more proficient than I. So far, LINQ to XML is working out fine, and it has actually been extremely simple to port the Open XML SDK code over. — ptrc, Aug 25 '11 at 17:20
I'm happy to see that I'm not crazy because I use one hour to find why all the whitespace disapeard. I can't use Linq to Xml without waste hours to change my code but I keep it in mine for futur project. I hope the bug will be fixed at the next version of SDK — Arnaud Bessems, Oct 05 '12 at 13:43

score 2 · Answer 2 · answered May 20 '13 at 04:55

2

Open XML SDK 2.5 has this issue corrected

answered May 20 '13 at 04:55

Yasindu

149
9

Woop! So glad this is fixed as this makes Open XML SDK 2.5 a viable and now mature and working solution to an egregious task. Still the likes of ClosedXML should be available for all products. Would be good to see more like this. – Paul C Jun 06 '13 at 13:43

Powerpoint OpenXML whitespace is disappearing

2 Answers2