I have a very large (~1GB) XML file. I need to parse through it, find specific nodes, change the data in those nodes and then write it all to a new XML file. Here's the catch--there are a ton of elements that I don't care about--I don't even know what they all are--but they need to be copied over, too.
This SO post recommends I use an XmlReader so that I don't have to load the whole input file into memory. That question has this answer which recommends using the ReadToDescendant
method. This almost does what I need, but the problem is that I lose all the XML before the node that I "read to". Somehow I need to copy all of the stuff I just read past to the new file. I don't care what's there, it just needs to be copied verbatim.
This SO post would work (and there are several others like it), except that it uses XmlDocument
which, if I'm not mistaken, will load the whole thing into memory first. While that's fine for small files, I'd like to avoid that here.
For you visual types, here's an idea of what I want to do:
<root>
<SomeNodeUndefinedAtDesignTime>
<ThisNodeHasSubNodes>
<WhichHasSubNodes_Etc/>
</ThisNodeHasSubNodes>
</SomeNodeUndefinedAtDesignTime>
<AnotherUndefinedNode>
<!--Similar to the first, who knows what all is in here-->
</AnotherUndefinedNode>
<!-- There may be dozens or even hundreds of these -->
<ANodeIAmInterestedIn>
Old data to be replaced
</ANodeIAmInterestedIn>
<ANodeIAmInterestedIn>
More data to be replaced
</ANodeIAmInterestedIn>
<YetAnotherUndefinedNode>
<!-- stuff -->
</YetAnotherUndefinedNode>
</root>
I want to take that input and then output this:
<root>
<SomeNodeUndefinedAtDesignTime>
<ThisNodeHasSubNodes>
<WhichHasSubNodes_Etc/>
</ThisNodeHasSubNodes>
</SomeNodeUndefinedAtDesignTime>
<AnotherUndefinedNode>
<!--Similar to the first, who knows what all is in here-->
</AnotherUndefinedNode>
<!-- There may be dozens or even hundreds of these -->
<ANodeIAmInterestedIn>
Here is my new data
</ANodeIAmInterestedIn>
<ANodeIAmInterestedIn>
Here is more new data
</ANodeIAmInterestedIn>
<YetAnotherUndefinedNode>
<!-- stuff -->
</YetAnotherUndefinedNode>
</root>
Is there a way to
- Read the file as a stream so that I don't have to load the whole thing into memory at once
- Copy elements that are undefined at design time
- Change the data in specific elements
- Write the results to a new file