4

I have a very large (~1GB) XML file. I need to parse through it, find specific nodes, change the data in those nodes and then write it all to a new XML file. Here's the catch--there are a ton of elements that I don't care about--I don't even know what they all are--but they need to be copied over, too.

This SO post recommends I use an XmlReader so that I don't have to load the whole input file into memory. That question has this answer which recommends using the ReadToDescendant method. This almost does what I need, but the problem is that I lose all the XML before the node that I "read to". Somehow I need to copy all of the stuff I just read past to the new file. I don't care what's there, it just needs to be copied verbatim.

This SO post would work (and there are several others like it), except that it uses XmlDocument which, if I'm not mistaken, will load the whole thing into memory first. While that's fine for small files, I'd like to avoid that here.

For you visual types, here's an idea of what I want to do:

<root>
 <SomeNodeUndefinedAtDesignTime>
    <ThisNodeHasSubNodes>
        <WhichHasSubNodes_Etc/>
    </ThisNodeHasSubNodes>
 </SomeNodeUndefinedAtDesignTime>
 <AnotherUndefinedNode>
    <!--Similar to the first, who knows what all is in here-->
 </AnotherUndefinedNode>
 <!-- There may be dozens or even hundreds of these -->
 <ANodeIAmInterestedIn>
    Old data to be replaced
 </ANodeIAmInterestedIn>
 <ANodeIAmInterestedIn>
    More data to be replaced
 </ANodeIAmInterestedIn>
 <YetAnotherUndefinedNode>
    <!-- stuff -->
 </YetAnotherUndefinedNode>
</root>

I want to take that input and then output this:

<root>
 <SomeNodeUndefinedAtDesignTime>
    <ThisNodeHasSubNodes>
        <WhichHasSubNodes_Etc/>
    </ThisNodeHasSubNodes>
 </SomeNodeUndefinedAtDesignTime>
 <AnotherUndefinedNode>
    <!--Similar to the first, who knows what all is in here-->
 </AnotherUndefinedNode>
 <!-- There may be dozens or even hundreds of these -->
 <ANodeIAmInterestedIn>
    Here is my new data
 </ANodeIAmInterestedIn>
 <ANodeIAmInterestedIn>
    Here is more new data
 </ANodeIAmInterestedIn>
 <YetAnotherUndefinedNode>

    <!-- stuff -->
 </YetAnotherUndefinedNode>
</root>

Is there a way to

  1. Read the file as a stream so that I don't have to load the whole thing into memory at once
  2. Copy elements that are undefined at design time
  3. Change the data in specific elements
  4. Write the results to a new file
Community
  • 1
  • 1
David
  • 4,665
  • 4
  • 34
  • 60

1 Answers1

5

You can create a method with XmlReader and XmlWriter which will do exactly what you want.

public void CopyTo(XmlReader reader, XmlWriter writer, 
    Dictionary<string, string> replacements)
{
    var currentElementName = "";
    while (reader.Read())
    {
        switch (reader.NodeType)
        {
            case XmlNodeType.Element:
                currentElementName = reader.Name;
                writer.WriteStartElement(reader.Name);

                //Copy all attributes verbatim
                if (reader.HasAttributes)
                   writer.WriteAttributes(reader, true);

                //Handle empty elements by telling the writer to close right away
                if (reader.IsEmptyElement)
                   writer.WriteEndElement();
                break;
            case XmlNodeType.EndElement:
                currentElementName = "";
                writer.WriteEndElement();
                break;
            case XmlNodeType.Text:
                if (replacements.ContainsKey(currentElementName))
                    writer.WriteString(replacements[currentElementName]);
                else
                    writer.WriteString(reader.Value);
                break;
            case XmlNodeType.Whitespace:
                writer.WriteWhitespace(reader.Value);
                break;
           //Other cases. Attributes, comments etc.
        }
        writer.Flush();
    }
}

It's a simple example which will not handle attributes and some other staff, but it works fine with elements.
It will copy xml structure and all elements and replace innerText for known elements.

Usage:

var xmlReader = XmlReader.Create(File.OpenRead("inputFilePath"));
var xmlWriter = XmlWriter.Create(File.OpenWrite("outputFilePath"));
var replacements = new Dictionary<string, string> { ... };
CopyTo(xmlReader, xmlWriter, replacements);

Also, you can use async versions of XmlReader methods if you want to do something useful, when it reads large elements.

David
  • 4,665
  • 4
  • 34
  • 60
Kote
  • 2,116
  • 1
  • 19
  • 20
  • I tweaked your code sample a bit and now it does all the copying the way I had wanted. Thanks for the strong push in the right direction! – David Feb 19 '16 at 18:47