0

My approach is pretty simple. I am getting two files from internet (served as .docx files), get the byte[] for those two file. And performing Append() operation on the destination file, appending the cloned Body of the source file. The below is my code

using Microsoft.AspNetCore.Mvc;
using Newtonsoft.Json;
using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml;
using System.Collections.Generic;

namespace WhatApp.Controllers
{
    [Route("api/[controller]")]
    [ApiController]
    public class DocController : ControllerBase
    {
        [HttpGet]
        public async Task<IActionResult> Get()
        {
            byte[] file1 = await GetBytes("https://dummyfileserver.io/file/1");
            byte[] file2 = await GetBytes("https://dummyfileserver.io/file/2");

            byte[] result = MergeFiles(file1, file2);

            // To return the file
            return File(result, "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
        }

        private async Task<byte[]> GetBytes(string url)
        {
            using HttpClient httpClient = new HttpClient();
            var res = await httpClient.GetAsync(url);
            if (res.IsSuccessStatusCode)
            {
                using var filestream = await res.Content.ReadAsStreamAsync();
                var filebytes = new byte[filestream.Length];
                filestream.Read(filebytes, 0, filebytes.Length);

                return filebytes;
            }
            throw new Exception();
        }

        private byte[] MergeFiles(byte[] dest, byte[] src)
        {
            using (MemoryStream destMem = new MemoryStream())
            {
                destMem.Write(dest, 0, (int)dest.Length);
                using (WordprocessingDocument mywDoc =
                    WordprocessingDocument.Open(destMem, true))
                {
                    mywDoc.MainDocumentPart.Document.Body.InsertAt(new PageBreakBefore(), 0);

                    mywDoc.MainDocumentPart.Document.Body.Append(new Paragraph(new Run(new Break() { Type = BreakValues.Page })));

                    var srcElements = GetSourceDoc(src);
                    mywDoc.MainDocumentPart.Document.Body.Append(srcElements);
                    mywDoc.Close();
                }
                return destMem.ToArray();
            }
        }

        private OpenXmlElement GetSourceDoc(byte[] src)
        {
            using (MemoryStream srcMem = new MemoryStream())
            {
                srcMem.Write(src, 0, (int)src.Length);
                using (WordprocessingDocument srcDoc =
                    WordprocessingDocument.Open(srcMem, true))
                {
                    OpenXmlElement elem = srcDoc.MainDocumentPart.Document.Body.CloneNode(true);
                    srcDoc.Close();
                    return elem;
                }
            }
        }
    }
}

The result file does not show the images properly in the region where file2 is being added (second part of the response document).

What must be the reason for this problem? How to solve it?

Another issue I noticed is the the debugging forcefully stops after I save the file to local machine. What must be the cause of that?

user2129013
  • 102
  • 1
  • 9

2 Answers2

2

I see your requirement to combine two word files using ASP.NET core. I highly suspect AltChunks is a good idea as your response is a FileContentResult coming out from a byte[] array. Indeen OpenXML does not hide the complexity. But OpenXML PowerTools is what I will recommend to consider. It is now maintained by Eric White and has a nuget package for .NET standard as well. Just go ahead and install the package and modify your MergeFiles() method as below:

private byte[] MergeFiles(byte[] dest, byte[] src)
{
    var sources = new List<Source>();
    
    var destMem  = new MemoryStream();
    destMem .Write(dest, 0, dest.Length);
    sources.Add(new Source(new WmlDocument(destMem .Length.ToString(), destMem), true));

    var srcMem  = new MemoryStream();
    srcMem .Write(src, 0, src.Length);
    sources.Add(new Source(new WmlDocument(srcMem .Length.ToString(), srcMem ), true));

    var mergedDoc = DocumentBuilder.BuildDocument(sources);

    MemoryStream mergedFileStream = new MemoryStream();
    mergedDoc.WriteByteArray(mergedFileStream);

    return mergedFileStream.ToArray();
}

Source DocumentBuilder and WmlDocument are coming from OpenXmlPowerTools namespace. Good luck!

Isham Mohamed
  • 2,629
  • 1
  • 14
  • 27
  • Let me try this out. Since OpenXML Power tools seems like a wrapper around OpenXML, my fear is that it would take more response time as I am building an API which I expect to send a quick response. – user2129013 Mar 12 '21 at 03:33
  • 1
    @user2129013 This is document manipulation, which certainly require time. For this type of scenarios you can implement `asynchronous request - reply` pattern in your API. Microsoft has a great documentation on this, https://learn.microsoft.com/en-us/azure/architecture/patterns/async-request-reply which involves Azure Function etc. If you are using .NET 5, you can use https://github.com/IshamMohamed/synca - this to generate in process asynchronous request reply API endpoints. – Isham Mohamed Mar 12 '21 at 03:39
  • It worked. Thanks for the cloud pattern suggestion. I can consider the way documented in Microsoft Documentation. – user2129013 Mar 12 '21 at 04:18
0

Images are stored seperately and you will need to manually include them as well. You will also need fix all the relationships within OpenXml. Unfortunately, OpenXML is not trivial and the SDK does not hide that complexity.

However, if you know, that your word document is opened by software (i.e. MS Word) that understands AltChunks, there might be an easy way for you: I suggest you to look at Merge multiple word documents into one Open Xml

From my experience: How good this works depends heavily on the complexity of your documents and the intended usage. Opening it with MS Word is usually fine but for example converting it to PDF on the server (with a 3rd party library) might not give the intended results.

Akade
  • 308
  • 1
  • 9
  • Hi, I tried, https://stackoverflow.com/a/18352504/2129013 approach - but it seems it is just overwriting the document. My requirement is to merge these files from webserver and I guess these approaches are mostly suitable for the Windows processes only? – user2129013 Mar 12 '21 at 03:32