0

I am trying to create a word document using a word template in my C# application using openXML. Here is my code so far:

DirectoryInfo tempDir = new DirectoryInfo(Server.MapPath("~\\Files\\WordTemplates\\"));

DirectoryInfo docsDir = new DirectoryInfo(Server.MapPath("~\\Files\\FinanceDocuments\\"));

string ype = "test Merge"; //if ype string contains spaces then I get this error
string sourceFile = tempDir + "\\PaymentOrderTemplate.dotx";
string destinationFile = docsDir + "\\" + "PaymentOrder.doc";

// Create a copy of the template file and open the copy 
File.Copy(sourceFile, destinationFile, true);

// create key value pair, key represents words to be replace and 
//values represent values in document in place of keys.
Dictionary<string, string> keyValues = new Dictionary<string, string>();
keyValues.Add("ype", ype);                
SearchAndReplace(destinationFile, keyValues);
Process.Start(destinationFile);

And the SearchAndReplace funtion:

public static void SearchAndReplace(string document, Dictionary<string, string> dict)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
    {
        string docText = null;

        using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
        {
            docText = sr.ReadToEnd();
        }

        foreach (KeyValuePair<string, string> item in dict)
        {
            Regex regexText = new Regex(item.Key);
            docText = regexText.Replace(docText, item.Value);
        }

        using (StreamWriter sw = new StreamWriter(
                  wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
        {
            sw.Write(docText);
        }
    }
}

But when I try to open the exported file I get this error:

XML parsing error

Location: Part: /word/document.xml, line: 2, Column: 2142

Document.xml first lines:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>


<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid wp14">

<w:body>

<w:tbl>

<w:tblPr>

<w:tblW w:w="10348" w:ttest Merge="dxa"/>

<w:tblInd w:w="108" w:ttest Merge="dxa"/>

<w:tblBorders>

Edit I found out that the problem occured because I was using mergefields in the word template. If I use plain text it works. But in this case it will be slow because it has to check every single word in the template and if matches replace it. Is it possible to do it in another way?

Community
  • 1
  • 1
aggicd
  • 727
  • 6
  • 28
  • Even though you are kind of *new* to SO, please i am begging you to edit your code properly. – L. Guthardt Nov 16 '17 at 12:01
  • What's on line 2 of the XML document? – GrandMasterFlush Nov 16 '17 at 12:02
  • @GrandMasterFlush I found that if the replacement string contains spaces I get this error, otherwise it works fine – aggicd Nov 16 '17 at 12:12
  • I notice you're using the '.doc' extension for your generated document - shouldn't it be '.docx' if it's coming from a '.dotx' template? I'd guess your issue is around XML encoding. – GrandMasterFlush Nov 16 '17 at 12:15
  • @GrandMasterFlush if my replacement string is not containing any spaces then it works fine... I dont think that it has to do with file extension – aggicd Nov 16 '17 at 12:18
  • If you're generating a .docx, call it a .docx - GMF isn't telling you to change the ext because it will solve the problem, it's just a side note that you're doing something you shouldn't that will possibly cause a nuisance later down the line. To better explain the error we really need to see the file you made. rename the `.doc` to `.zip`, open it as an archive, extract the xml doc that describes the content of the document and post the first 5 lines of it, so we can tell you why line 2 isn't parsing – Caius Jard Nov 16 '17 at 12:41
  • @CaiusJard check my edited post – aggicd Nov 16 '17 at 12:55
  • I took your XML you posted and removed the line breaks from it, column2142 is around this part: `` - it looks like there is a space in the attribute name `ttest Merge` - does this mean anything to you? – Caius Jard Nov 16 '17 at 13:05
  • @CaiusJard as you can see in my code I want to replace the mergefield `ype` in my template document with the string `test Merge` . – aggicd Nov 16 '17 at 13:07
  • @CaiusJard if I use plain strings in my template instead of mergefields it works. Please check my updated post, because using plain strings is not so good idea – aggicd Nov 16 '17 at 14:44
  • This is related and possibly useful to you - https://stackoverflow.com/questions/28697701/openxml-tag-search/28719853#28719853 – petelids Nov 16 '17 at 23:13

1 Answers1

1

Disclaimer: You seem to be using the OpenXML SDK, because your code looks virtually identical to that found here: https://msdn.microsoft.com/en-us/library/bb508261(v=office.12).aspx - I've never in my life used this SDK and I'm basing this answer on an educated guess at what's happening

It seems that the operation you're carrying out on this Word document is affecting parts of the document that you didn't intend.

I believe that calling document.MainDocumentPart.GetStream() just giving you more or less raw direct access to the XML of the document, and you're then treating it as a plain xml file, manipulating it as text, and carrying out a list of straight text replacements? I think it's thus likely the cause of the problem because you're intending to edit document text, but accidentally damaging xml node structure in the process

By way of an example, here is a simple HTML document:

<html>
 <head><title>Damage report</title></head>
 <body>
  <p>The soldier was shot once in the body and twice in the head</p>
 </body>
</html>

You decide to run a find/replace to make the places the soldier was shot, a bit more specific:

var html = File.ReadAllText(@"c:\my.html");
html = html.Replace("body", "chest");
html = html.Replace("head", "forehead");
File.WriteAllText(@"c:\my.html");

Only thing, your document is now ruined:

<html>
 <forehead><title>Damage report</title></forehead>
 <chest>
  <p>The soldier was shot once in the chest and twice in the forehead</p>
 </chest>
</html>

A browser can't parse it (well, it's still valid I suppose, but it's meaningless) any more because the replacement operation broke some things.

You're replacing "ype" with "test Merge" but this seems to be clobbering an occurrence of the word "type" - something that it seems pretty likely would appear in the XML attribute or element names - and turning it into "ttest Merge".

To correctly change the content of an XML document's node texts, it should be parsed from text to an XML document object model representation, the nodes iterated, the texts altered, and the whole thing re-serialized back to xml text. Office SDK does seem to provide ways to do this, because you can treat a document like a collection of class object instances, and say things like this code snippet (also from MSDN):

// Create a Wordprocessing document. 
using (WordprocessingDocument myDoc = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document)) 
{ 
   // Add a new main document part. 
   MainDocumentPart mainPart = myDoc.AddMainDocumentPart(); 
   //Create DOM tree for simple document. 
   mainPart.Document = new Document(); 
   Body body = new Body(); 
   Paragraph p = new Paragraph(); 
   Run r = new Run(); 
   Text t = new Text("Hello World!"); 
   //Append elements appropriately. 
   r.Append(t); 
   p.Append(r); 
   body.Append(p); 
   mainPart.Document.Append(body); 
   // Save changes to the main document part. 
   mainPart.Document.Save(); 
}

You should be looking for another way, not using streams/direct low level xml access, to access the document elements. Something like these:

https://blogs.msdn.microsoft.com/brian_jones/2009/01/28/traversing-in-the-open-xml-dom/ 
https://www.gemboxsoftware.com/document/articles/find-replace-word-csharp

Or possibly starting with a related SO question like this: Search And Replace Text in OPENXML (Added file) (though the answer you need may be in the something linked inside this question)

Caius Jard
  • 72,509
  • 5
  • 49
  • 80