1

I'd like to merge a few xml files. The destination xml is slightly different then the source files. The destination file contains an aditional root element.

For example.

The destination xml:

<?xml version="1.0" encoding="utf-8"?>
<customer ID="A0001" name="customername">
.....
.....
</customer>


Source xml:

<?xml version="1.0" encoding="utf-8"?>
<order number="00001">
    <.....>
    <.....>
    <.....>
</order>

Every source xml file needs to be inserted between <customer ...> and </customer>

The source files can be very large (e.g. 2 Gb).

I can write the destination xml file with the root element and read the source files using XmlTextReader and

string myOrder = textReader.ReadOuterXml();
                        writer.WriteRaw(myOrder );



Result (where every order is a different xml file)

<?xml version="1.0" encoding="utf-8"?>
<customer ID="A0001" name="customername">
    <order number="00001">
        <.....>
        <.....>
        <.....>
    </order>
    <order number="00002">
        <.....>
        <.....>
        <.....>
    </order>
    <order number="00003">
        <.....>
        <.....>
        <.....>
    </order>
</customer>


But i'm afraid of out of memory exeptions for the large files using ReadOuterXml().

Any suggestion ?

John Doe
  • 9,843
  • 13
  • 42
  • 73

1 Answers1

2

It sounds like in this particular case, assuming all the files are really using UTF-8, you can basically cheat. .NET 4 makes this particularly easy:

public void MergeFiles(string outputPath, string prefix, string suffix,
                       IEnumerable<string> files)
{
    File.WriteAllText(outputPath, prefix);
    var lines = files.SelectMany(file => File.ReadLines(file).Skip(1));
    File.AppendAllLines(outputPath, lines);
    File.AppendAllText(outputPath, suffix);
}

This isn't quite as efficient as it might be, as it'll open the output file three times - but it's written about as simply as I could make it. Note that lines here is lazy - this won't read the source files completely into memory; it'll read a line at a time.

It does rely on each file starting with the XML declaration and being in UTF-8 though. There are far more robust streaming approaches you could use, but if you're confident of your source format, this is very simple...

EDIT: Sample usage:

string prefix = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n"
              + "<customer ID=\"A0001\" name=\"customername\">";
MergeFiles("output.xml", prefix, "</customer>", sourceFiles);

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194