3

I have a question on how to save an xmldoc as a word file. I want to open the word file, do some manipulation on the undelying xml structure using the xmldocument class and then resave it back to the word file. This is what im currently doing:

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(@"E:\HelloWorld.docx", true))
                {
                   MainDocumentPart mainPart = wordDoc.MainDocumentPart;
                   var xmlDoc = new XmlDocument();
                   using (Stream partStream = part.GetStream())
                   using (XmlReader partXmlReader = XmlReader.Create(partStream))
                     xmlDoc.Load(partXmlReader);
                   //xml node manipulation here

                   xmlDoc.Save(@"E:\HelloWorld.docx");
                 }

This results in a corrupt document however. What is the proper way to do this functionality?

John Baum
  • 3,183
  • 11
  • 42
  • 90
  • Could you please comment on what is corrupt and what you expect: HelloWorld.Test will be an XML file, not docx, so is XML invalid XML or is HelloWorld.docx corrupt or you expect HelloWorld.Test to be an worrd document? – Alexei Levenkov Mar 23 '12 at 16:12
  • sry that was a typo. I am trying to open a docx, use the xmldoc to do extract the xml structure, modify it and then write it back to another docx file. so HelloWorld should be a docx – John Baum Mar 23 '12 at 16:21

3 Answers3

3

OpenXML document is more than just a XML file (actually, it's a ZIP archive containing several files, XML files among them).

What you should do is to modify your WordprocessingDocument and then save it (which is done automatically at the end of the using block), not save the XML file that represents part of the document:

using (var wordDoc = WordprocessingDocument.Open(fileName, true))
{
    MainDocumentPart mainPart = wordDoc.MainDocumentPart;

    using (Stream partStream = mainPart.GetStream())
    {
        var xmlDoc = new XmlDocument();

        using (XmlReader partXmlReader = XmlReader.Create(partStream))
            xmlDoc.Load(partXmlReader);

        //xml node manipulation here

        partStream.Position = 0;

        using (XmlWriter partXmlWriter = XmlWriter.Create(partStream))
            xmlDoc.Save(partXmlWriter);
    }
}
svick
  • 236,525
  • 50
  • 385
  • 514
  • im not sure if its this method or something wrong with my xml manipulation code thats at fault since i cant see any changes. basically in the manipulation part, i am trying to delete several runs from the document but they still show up by the end. – John Baum Mar 23 '12 at 17:41
  • @JohnBaum, this code works for me. Are you sure you're using it exactly as I wrote it (reset the `Position` of the stream and then write the XML to it)? And we can't help you with your code unless you how it to us. You should probably ask another question and put your code there. – svick Mar 23 '12 at 18:13
  • Seems correct to me. Have you debugged it to see whether it actually removed the nodes you think it should be removing? – svick Mar 23 '12 at 18:57
  • how would i debug that? the nodes are being removed from the parent if thats what you mean – John Baum Mar 23 '12 at 19:16
0

If you have successfully performed the manipulation you can later save back to file using Close() on your wordDoc variable. The MSDN states that this also saves the content.

Philipp Aumayr
  • 1,400
  • 11
  • 14
0

docx will be an XML file, not DOCX.

var xmlDoc = new XmlDocument();
... 
xmlDoc.Save(@"E:\HelloWorld.docx");

What you want is either create new WordprocessingDocument or update existing one with the XML you've modified. Something along the lines

using (StreamWriter sw =
    new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
    {
        xmlDoc.Save(sw);
    }

See more samples in the MSDN: http://msdn.microsoft.com/en-us/library/documentformat.openxml.wordprocessing.document.aspx

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179