0

I am working on a function that allows me to create a simple word document from a string. I am using DocumentFormat.OpenXml Version="2.20.0" to create the word document. I don't understand why I can't save my word document in a memory stream whereas I can save the word document in a file.

  public Task<byte[]> ConvertToWordAsync(string text)
    {
        if (text.IsNullOrEmpty())
            return Task.FromResult(Array.Empty<byte>());

        using var memoryStream = new MemoryStream();
        using var wordDocument = WordprocessingDocument.Create(memoryStream, WordprocessingDocumentType.Document);
        
        MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
        mainPart.Document = new Document
        {
            Body = new Body()
        };
        Body body = mainPart.Document.Body;
        Paragraph paragraph = new Paragraph();
        Run run = new Run();
        Text bodyText = new Text(text);
        run.Append(bodyText);
        paragraph.Append(run);
        body.Append(paragraph);

        wordDocument.Save();
        
        return Task.FromResult(memoryStream.ToArray());
    }

When I call this function, the memory stream is always empty. If i change

using var wordDocument = WordprocessingDocument.Create(memoryStream, WordprocessingDocumentType.Document);

To

using var wordDocument = WordprocessingDocument.Create("C:\\Workspace\\65.docx", WordprocessingDocumentType.Document);

I am able to open the word file.

I don't understand why I can't save the same word file to a memory stream. Do you have any idea about the solution of this problem ?

rene
  • 41,474
  • 78
  • 114
  • 152
  • The body (contents) of an HTTP message has to be consistent with the Content-Type (see : https://www.geeksforgeeks.org/http-headers-content-type/?force_isolation=true). Also attachments can be MIME which starts with a new line containing two dashes (see : https://learn.microsoft.com/en-us/previous-versions/office/developer/exchange-server-2010/aa563375(v=exchg.140)?force_isolation=true). A HTTP message must be HTML format if text or use GZIP (compression) which converts binary to text or use Base64 String – jdweng Apr 14 '23 at 12:47
  • 1
    I expect the automagic using block to be an issue here. The using block for the wordDocument needs to end before `return Task.FromResult(memoryStream.ToArray());` or if you don't like that call `wordDocument.Close();` before getting the array from the memory stream. – rene Apr 14 '23 at 13:12
  • 2
    @jdweng the second half of your comment makes no sense as usual, but it doesn't seem to apply at all to this question? Nobody's talking about HTTP here. – CodeCaster Apr 14 '23 at 13:15
  • @rene You are completly right, thank you. I needed to use a try/finally, and call wordDocument.Dispose(). Now it works properly :) – Alexandre Sobral Martins Apr 14 '23 at 13:54
  • @CodeCaster I tried but it did not solver the problem. Thank you for your help anyway :) – Alexandre Sobral Martins Apr 14 '23 at 13:57
  • Yeah I removed that comment, rewinding a MemoryStream isn't necessary when you call ToArray() on it, only when passing it on to other code which reads it as a stream. – CodeCaster Apr 14 '23 at 14:02

1 Answers1

1

tl;dr Don't let the compiler guess where the using block ends when you rely on the Dispose call.

A using var Foo = new Foo(); Foo.Whatever(); still generates this code when compiled (irrelevant details omitted, see: What are the uses of "using" in C#?) :

   var Foo = new Foo();
   try
   {
      Foo.Whatever();
   }
   finally
   {
      Foo.Dispose();
   }

The Dispose call is relevant here.

In this (details omitted) code:

public Task<byte[]> ConvertToWordAsync(string text)
{
    using var memoryStream = new MemoryStream();
    using var wordDocument = WordprocessingDocument.Create(memoryStream, WordprocessingDocumentType.Document);
    
    // details omitted for brevity

    wordDocument.Save();
    
    return Task.FromResult(memoryStream.ToArray());
}

the compiler generated:

public Task<byte[]> ConvertToWordAsync(string text)
{
    var memoryStream = new MemoryStream();
    try 
    {
       var wordDocument = WordprocessingDocument.Create(memoryStream, WordprocessingDocumentType.Document);
       try
       {
          // details omitted for brevity

           wordDocument.Save();
    
           return Task.FromResult(memoryStream.ToArray());
       }
       finally
       {
          wordDocument.Dispose();
       } 
    }
    finally
    {
       memoryStream.Dispose();
    }
}

The problem here is that WordprocessingDocument needs to write to its Stream a complete Zip archive with multiple files to create a valid OpenXml file container. It will only do so when no more calls to Save() are expected. That is either when Close() gets called or its Dispose() method gets invoked.
Before that the stream is incomplete at best.

Due to where the compiler emitted the finally blocks with the calls to the Dispose methods, the memoryStream wasn't even close to be complete when ToArray() was called. It was when your method returned but no code was left to capture that data.

You say you solved the issue by explicitly calling Dispose. That works. Or fallback to sane syntax without surprises:

using var memoryStream = new MemoryStream();
using(var wordDocument = WordprocessingDocument.Create(memoryStream, WordprocessingDocumentType.Document))
{

  // details omitted for brevity

  wordDocument.Save();
} // wordDocument.Dispose() called here
return Task.FromResult(memoryStream.ToArray());

rene
  • 41,474
  • 78
  • 114
  • 152