2

I have a new project where I need to generate a DOCX. My client has provided me with an existing DOCX where I need to replace some placeholders with some customer data from the database. As if this isn’t challenging enough, there are certain parts that are optional based on some conditions using the customer data. So I will have to provide some logic to totally omit some parts of the DOCX.

After way too much research and some POC’s, I’ve come across a new approach. I’ve saved the DOCX as a Word XML Document. This creates a big XML file with everything in it, even the images are encoded as base64. After doing that I copied the content of the XML file to a T4-template. Doing this allows me to add dynamic content based on the customer data and generate a Word XML Document in my code as a large string.

But now I’m stuck at creating a Docx again based on the Word XML Document string. I’ve tried using the OpenXml Sdk but can’t find any real documentation on how to do this. After some experimentation I ended up with the code below but it doesn’t parse XML (Data at the root level is invalid. Line 1, position 1).

As a second attempt, I tried out some suggestion from another post but this results in another exception (The XML has invalid content and cannot be constructed as an element. (Parameter 'outerXml'))

Is there a way to do this or should I just leave the T4-template and try another approach? Another problem with the T4-template is the size of some the images, it results in a long base64 string that just generates way too much lines. I guess I could replace the images with placeholders and swap them just before I create the XML...

    public FileData CreateDocx(string title, string xml)
    {
        using (MemoryStream generatedDocument = new MemoryStream())
        {
            using (WordprocessingDocument package =
                WordprocessingDocument.Create(generatedDocument, WordprocessingDocumentType.Document))
            {
                var mainPart = package.AddMainDocumentPart();
                //First attempt
                //new Document(xml).Save(mainPart);

                var doc = new XmlDocument();
                doc.LoadXml(xml);
                new Document(doc.OuterXml).Save(mainPart);
            }

            return new FileData(title, generatedDocument.ToArray());
        }
    }
Beejee
  • 1,836
  • 2
  • 17
  • 31
  • I think [microsoft.office.interop.word](https://learn.microsoft.com/en-us/dotnet/api/microsoft.office.interop.word?view=word-pia) has all the stuff that you need. – Luuk Feb 06 '21 at 17:44
  • Perhaps try DocX: https://github.com/xceedsoftware/DocX It has a much cleaner API than Microsoft – Thomas Weller Feb 06 '21 at 18:39
  • Isn't it discouraged to use this on a webserver? I will have to have a office license installed on it to make us of the library. – Beejee Feb 06 '21 at 18:40
  • @Thomas I could try it out as it would fix some problems but how would I be able to remove sections from the document based on a condition? Do you have an example I could use? – Beejee Feb 06 '21 at 18:50
  • Can you post an image of the existing .docx (or one that has a similar format). What are the "placeholders" and how do you determine if something is a placeholder? – Tu deschizi eu inchid Feb 06 '21 at 19:05
  • I would just add some unique text like {{Customer name}} as placeholder. For the most part it consist of paragraphs and a lot of images. Some paragraphs and images, sections of the document, should be removed if the customer is not of the correct type. At the end there is also a table where based on the order some rows should be removed. Furthermore the whole document contains a page header and footer on each page except the first. The first is an image with some overlay text. – Beejee Feb 06 '21 at 19:18
  • I would use openXML Powertools or Aspose.Words. https://github.com/EricWhiteDev/Open-Xml-PowerTools – Herr Kater Feb 06 '21 at 19:24
  • @Herr Kater, I've already tried the powertoold out but couldn't find out how to use it to my advantage. Could you provide any more details about how to use it in my scenario? – Beejee Feb 06 '21 at 19:33
  • 1
    This explains how to prepare your template: http://www.ericwhite.com/blog/getting-started-with-open-xml-powertools-documentassembler/ This one shows how to transform the template: https://github.com/EricWhiteDev/Open-Xml-PowerTools/blob/vNext/OpenXmlPowerToolsExamples/DocumentAssembler/DocumentAssembler.cs – Herr Kater Feb 07 '21 at 16:24

1 Answers1

1

Based on the feedback of Thomas Weller, I tried out DocX. This library makes it way easier to open/duplicate/create DOCX files. After some research I totally changed my approach. I ended up using the existing DOCX as a template.

First of all I added placeholders to the paragraphs where I needed to inject data from database. For this I used something like {{CustomerName}}. By using the replaceText I was able to swap all the placeholders with the correct data.

After doing this I added sections. This can be done easily in Word by using this guide. Once the sections were added I also added a placeholder to mark the sections since you can’t name a section in Word. So I ended up with placeholders at the beginning of the sections like {{SectionNationalCustomer}}. This allowed me to lookup my section with a Linq query to search through all the section with a paragraph that contained my placeholder.

Once I collected the conditional sections, I was able to ‘remove’ them by looping over all the SectionParagraphs and removing them. A total remove of the sections doesn’t seem possible. When the section needed to be visible, it was only a matter of replacing the placeholder with an empty string.

The final thing I need was to find the correct table in the document. I tried the same approach as before by using a new section. But It seems like the Tables Collection of the Section object is always empty even if there is a Table in it. So I needed another approach. Again I made use of a unique placeholder in the first column of the table like {{TableQuotation}}. Then I just did the same as with the sections and wrote a Linq query to select the right table by looking for a paragraph with the right placeholder.

After all this I ended up with some code that looked very similar to this:

using (var memoryStream = new MemoryStream())
{
    // Load  template document and make copy
    using (var template = DocX.Load("MyTemplate.docx"))
    {
        var document = template.Copy();

        //Swap placeholder with data
        document.ReplaceText("{{CustomerName}}", myData.CustomerName);

        //Hide or show section based on condition
        var section = document.Sections.FirstOrDefault(s => s.SectionParagraphs.Any(p => p.Text.StartsWith("{{SectionNationalCustomer}}")));
        if (myData.Customer.Address.National == true)
        {
            //Remove placeholder when section stays visible
            document.ReplaceText("{{SectionNationalCustomer}}", "");
        }
        else
        {
            //Remove contents of section
            foreach (var paragraph in section.SectionParagraphs)
            {
                document.RemoveParagraph(paragraph);
            }
        }

        //Find and edit table
        var table = document.Tables.FirstOrDefault(s => s.Paragraphs.Any(p => p.Text.Contains("{{TableQuotation}}")));
        document.ReplaceText("{{TableQuotation}}", "");
        table.RemoveRow(1);
        
        document.SaveAs(memoryStream);
    }

    return memoryStream.ToArray();
}
Beejee
  • 1,836
  • 2
  • 17
  • 31