2

I am trying to merge multiple documents into a single one and then open the result document and process it further.

The "ChunkId" is a property that is increased every time this method is called in order to get a unique id. I followed the example from this site. This is the code used to merge multiple documents (using altchunks): `

private void MergeDocument(string mergePath, bool appendPageBreak)
    {
        if (!File.Exists(mergePath))
        {
            Log.Warn(string.Format("Document: \"{0}\" was not found.", mergePath));
            return;
        }

        ChunkId++;
        var altChunkId = "AltChunkId" + ChunkId;

        var mainDocPart = DestinationDocument.MainDocumentPart;
        if (mainDocPart == null)
        {
            DestinationDocument.AddMainDocumentPart();
            mainDocPart = DestinationDocument.MainDocumentPart;
            if (mainDocPart.Document == null)
                mainDocPart.Document = new Document { Body = new Body() };
        }

        try
        {
            var chunk = mainDocPart.AddAlternativeFormatImportPart(
                AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            if (chunk != null)
                using (var ms = new FileStream(mergePath, FileMode.Open))
                {
                    chunk.FeedData(ms);
                }
            else
            {
                Log.Error(string.Format("Merge - Failed to create chunk document based on \"{0}\".", mergePath));
                return; // failed to create chunk document, return from merge method

            }
        }
        catch (Exception e)
        {
            Log.Error(string.Format("Merge - Failed to insert chunk document based on \"{0}\".", mergePath));
            return; // failed to create chunk document, return from merge method

        }

        var altChunk = new AltChunk { Id = altChunkId };

        //append the page break
        if (appendPageBreak)
            try
            {
                AppendPageBreak(mainDocPart);
                Log.Info(string.Format("Successfully appended page break."));
            }
            catch (Exception ex)
            {
                Log.Error(string.Format("Eror appending page break. Message: \"{0}\".", ex.Message));
                return; // return if page break insertion failed
            }

        // insert the document 
        var last = mainDocPart.Document
        .Body
        .Elements()
        .LastOrDefault(e => e is Paragraph || e is AltChunk);
        try
        {
            if (last == null)
                mainDocPart.Document.Body.InsertAt(altChunk, 0);
            else
                last.InsertAfterSelf(altChunk);
            Log.Info(string.Format("Successfully inserted new doc \"{0}\" into destination.", mergePath));
        }
        catch (Exception ex)
        {
            Log.Error(string.Format("Error merging document \"{0}\". Message: \"{1}\".", mergePath, ex.Message));
            return; // return if the merge was not successfull
        }

        try
        {
            mainDocPart.Document.Save();
        }
        catch (Exception ex)
        {
            Log.Error(string.Format("Error saving document \"{0}\". Message: \"{1}\".", mergePath, ex.Message));
        }
    }`

If I open the merged document with Word I can see its content (tables, text, paragraphs..), but if I open if from code again it says that inner text is "" (empty string). I need that inner text to reflect what the document contains because I have to replace some placeholders like "@@name@@" with another text and I can't if the inner text is empty.

This is the innerxml of the merged document,

enter image description here

This is how I open the merged document:

DestinationDocument = WordprocessingDocument.Open(Path.GetFullPath(destinationPath), true);

How can I read the inner text of the document? Or how can I merge these documents into a single one so that this problem would not occur anymore?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Marian Simonca
  • 1,472
  • 1
  • 16
  • 29

2 Answers2

3

When documents merged with AltChunks it is like embedded attachments to the original word document. The client (MS Word) handles the rendering of the altchunk sections. Hence the resulting document won't have the openxml markup of the merged documents.

If you want to use the resulting document for further programmatic post-processing use Openxml Power Tools. pelase refer to my answer here

Openxml powertools - https://github.com/OfficeDev/Open-Xml-PowerTools

Community
  • 1
  • 1
Flowerking
  • 2,551
  • 1
  • 20
  • 30
  • Thank you, worked like a charm. The only problem left to solve is if it is possible to insert a page break after a document using DocumentBuilder. Good sir @Flowerking, thank you again :D – Marian Simonca May 19 '16 at 08:28
  • do you know a way to convert a .rtf file to .docx file? I need to merge a .rtf doc with .docx and DocumentBuilder requires .docx files. – Marian Simonca May 28 '16 at 13:08
0

The problem is that the documents are not really merged (per se), the altChunk element only defines a place where the alternative content should be placed in the document and it has a reference to that alternative content.
When you open this document in MS Word then it will actually merge all those alternative contents automatically for you. So when you resave that document with MS Word you'll no longer have altChunk elements.

Nevertheless what you can do is actually manipulate with those altChunk DOCX files (the child DOCX documents) just like you do with the main DOCX file (the parent document).

For example:

string destinationPath = "Sample.docx";
string search = "@@name@@";
string replace ="John Doe";

using (var parent = WordprocessingDocument.Open(Path.GetFullPath(destinationPath), true))
{
    foreach (var altChunk in parent.MainDocumentPart.GetPartsOfType<AlternativeFormatImportPart>())
    {
        if (Path.GetExtension(altChunk.Uri.OriginalString) != ".docx")
            continue;

        using (var child = WordprocessingDocument.Open(altChunk.GetStream(), true))
        {
            var foundText = child.MainDocumentPart.Document.Body
                .Descendants<Text>()
                .Where(t => t.Text.Contains(search))
                .FirstOrDefault();

            if (foundText != null)
            {
                foundText.Text = foundText.Text.Replace(search, replace);
                break;
            }
        }
    }
}

Alternatively you'll need to use some approach to merge those documents for real. One solution is mentioned by Flowerking, another one that you could try is with GemBox.Document library. It will merge those alternative contents for you on loading (just as MS Word does when opening).

For example:

string destinationPath = "Sample.docx";
string search = "@@name@@";
string replace = "John Doe";

DocumentModel document = DocumentModel.Load(destinationPath);

ContentRange foundText = document.Content.Find(search).FirstOrDefault();
if (foundText != null)
    foundText.LoadText(replace);

document.Save(destinationPath);
Mario Z
  • 4,328
  • 2
  • 24
  • 38
  • Thank you for the answer, this was helpful too, however I can't use GemBox.Document library because it is limited for 20 paragraphs as free version – Marian Simonca May 19 '16 at 09:44
  • Yes, Free mode has size limitation. Nevertheless I hope that a first suggestion (opening those altChunk DOCXs) is of use to you. – Mario Z May 19 '16 at 10:51