15

I am trying to remove paragraph (I'm using some placeholder text to do generation from docx template-like file) from .docx file using OpenXML, but whenever I remove paragraph it breaks the foreach loop which I'm using to iterate trough.

MainDocumentPart mainpart = doc.MainDocumentPart;
IEnumerable<OpenXmlElement> elems = mainPart.Document.Body.Descendants();

foreach(OpenXmlElement elem in elems){
    if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##")
    {
        Run run = (Run)elem.Parent;
        Paragraph p = (Paragraph)run.Parent;
        p.RemoveAllChildren();
        p.Remove();
    }
}

This works, removes my place holder and paragraph it is in, but foreach loop stops iterating. And I need more things to do in my foreach loop.

Is this ok way to remove paragraph in C# using OpenXML and why is my foreach loop stopping or how to make it not stop? Thanks.

edin-m
  • 3,021
  • 3
  • 17
  • 27

3 Answers3

19

This is the "Halloween Problem", so called because it was noticed by some developers on Halloween, and it looked spooky to them. It is the problem of using declarative code (queries) with imperative code (deleting nodes) at the same time. If you think about it, you are iterating though a linked list, and if you start deleting nodes in the linked list, you totally mess up the iterator. A simpler way to avoid this problem is to "materialize" the results of the query in a List, and then you can iterate through the list, and delete nodes at will. The only difference in the following code is that it calls ToList after calling the Descendants axis.

MainDocumentPart mainpart = doc.MainDocumentPart; 
IEnumerable<OpenXmlElement> elems = mainPart.Document.Body.Descendants().ToList(); 

foreach(OpenXmlElement elem in elems){ 
    if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##") 
    { 
        Run run = (Run)elem.Parent; 
        Paragraph p = (Paragraph)run.Parent; 
        p.RemoveAllChildren(); 
        p.Remove(); 
    } 
} 

However, I have to note that I see another bug in your code. There is nothing to stop Word from splitting up that text node into multiple text elements from multiple runs. While in most cases, your code will work fine, sooner or later, you or a user is going to take some action (like selecting a character, and accidentally hitting the bold button on the ribbon) and then your code will no longer work.

If you really want to work at the text level, then you need to use code such as what I introduce in this screen-cast: http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/08/04/introducing-textreplacer-a-new-class-for-powertools-for-open-xml.aspx

In fact, you could probably use that code verbatim to handle your use case, I believe.

Another approach, more flexible and powerful, is detailed in:

http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2011/06/13/open-xml-presentation-generation-using-a-template-presentation.aspx

While that screen-cast is about PresentationML, the same principles apply to WordprocessingML.

But even better, given that you are using WordprocessingML, is to use content controls. For one approach to document generation, see:

http://ericwhite.com/blog/map/generating-open-xml-wordprocessingml-documents-blog-post-series/

And for lots of information about using content controls in general, see:

http://www.ericwhite.com/blog/content-controls-expanded

-Eric

Eric White
  • 1,851
  • 11
  • 14
  • Actually I've done .ToList(), because some other complications appeared using previous solution. Also, I'm aware of word splitting it into multiple runs (this, here, was bad example), so my placeholders don't have '_'. And my placeholders are hardcoded, so although I'm aware of Content Control advantages, I didn't use them because I don't know them well enough and have short (mini-)project schedule. Thanks for the answer, it was very insightful, more complete. – edin-m Mar 27 '12 at 10:22
5

You have to use two cycles first that stores items you want to delete and second that deletes items. something like this:

List<Paragraph> paragraphsToDelete = new List<Paragraph>();
foreach(OpenXmlElement elem in elems){
    if(elem is Text && elem.InnerText == "##MY_PLACE_HOLDER##")
    {
        Run run = (Run)elem.Parent;
        Paragraph p = (Paragraph)run.Parent;
        paragraphsToDelete.Add(p);
    }
}

foreach (var p in paragraphsToDelete)
{
        p.RemoveAllChildren();
        p.Remove();
}
Denis Palnitsky
  • 18,267
  • 14
  • 46
  • 55
  • 1
    God, I'm stupid. Thanks. But why the hell it breaks from loop in the first place? (if somebody knows, so I'll leave it some time to accept answer; sry cannot vote, rep too low) – edin-m Mar 26 '12 at 16:43
  • http://stackoverflow.com/questions/2545027/exception-during-iteration-on-collection-and-remove-items-from-that-collection – Denis Palnitsky Mar 26 '12 at 16:46
  • Thanks. Found another good one: http://stackoverflow.com/questions/604831/collection-was-modified-enumeration-operation-may-not-execute – edin-m Mar 26 '12 at 16:50
0
Dim elems As IEnumerable(Of OpenXmlElement) = MainPart.Document.Body.Descendants().ToList()
        For Each elem As OpenXmlElement In elems
            If elem.InnerText.IndexOf("fullname") > 0 Then
                elem.RemoveAllChildren()
            End If

        Next
Mansfield
  • 14,445
  • 18
  • 76
  • 112
negini
  • 1
  • 1