9

I have an audit list full of serialized objects, and I'd like to compare them and return a list of the differences. By 'compare' I mean I want to return where the text for an element has changed, or where a node has been added (so its not in Xml1, but it is in Xml2- it won't happen the other way around)

Sample xml:

<HotelBookingView xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Id>119</Id>
  <RoomId>1</RoomId>
  <ChangeRequested>false</ChangeRequested>
  <CourseBookings>      
    <CourseHotelLink>
      <Id>0</Id>
    </CourseHotelLink>
</CourseBookings>
</HotelBookingView>

The namespaces and the names/case of the tags will not change. All that can change in this sample is the values between the tags, and the number of 'CourseHotelLink's (its a serialized list).

The final result I would like is a list of which node has changed- the old value and the new value.

What is the best option to compare them? I am using .Net 4.0 so linq is an option. I need to be able to do the comparison without necessarily knowing the names of all the nodes- though I will only ever compare two objects of the same type. I have been trying to use the following code, but I can't manage to adapt it to pick out changes in text as well as extra nodes.

XmlDocument Xml1 = new XmlDocument();
XmlDocument Xml2 = new XmlDocument();
Xml1.LoadXml(list[1].Changes);
Xml2.LoadXml(list[2].Changes);
foreach (XmlNode chNode in Xml2.ChildNodes)
{
   CompareLower(chNode);
}

protected void CompareLower(XmlNode aNode)
{
    foreach (XmlNode chlNode in aNode.ChildNodes)
    {
        string Path = CreatePath(chlNode);
        if (chlNode.Name == "#text")
        {
            //all my efforts at comparing text have failed
            continue;
        }
        if (Xml1.SelectNodes(Path).Count == 0)
        {
            XmlNode TempNode = Xml1.ImportNode(chlNode, true);
            //node didn't used to exist, this works- though doesn't return values
            str = str + "New Node: " + TempNode.Name + ": " + TempNode.Value;
        }
        else
        {
            CompareLower(chlNode);
        }
    } 
}

Its likely my code attempts are miles off and there is a much better way to do, any suggestions welcome!

EDITTED to add: I ended up using the MS Xml Diff Tool, the following code produces a big html table listing of the two xml nodes, with the differences highlighted in green. So its possible (though insane) to produce the html, then sort through it to find the text 'lightgreen' (the highlighted value), then do some string formations to display only the changed child-node.

var node1 = XElement.Parse("Xml string 1 here").CreateReader();
var node2 = XElement.Parse("Xml string 2 here").CreateReader();

MemoryStream diffgram = new MemoryStream();
XmlTextWriter diffgramWriter = new XmlTextWriter(new StreamWriter(diffgram));

XmlDiff xmlDiff = new XmlDiff(XmlDiffOptions.IgnoreChildOrder);
xmlDiff.Algorithm = XmlDiffAlgorithm.Fast;
xmlDiff.Compare(node1, node2,diffgramWriter);

diffgram.Seek(0, SeekOrigin.Begin);
XmlDiffView xmlDiffView = new Microsoft.XmlDiffPatch.XmlDiffView();
StringBuilder sb = new StringBuilder();
TextWriter resultHtml = new StringWriter(sb);
xmlDiffView.Load("Xml string 1", new XmlTextReader(diffgram)); 

xmlDiffView.GetHtml(resultHtml);
resultHtml.Close();
UglyTeapot
  • 393
  • 2
  • 6
  • 13
  • 1
    Take a look at this post: http://stackoverflow.com/questions/167946/how-would-you-compare-two-xml-documents – Adriano Repetti May 07 '12 at 13:01
  • I haven't yet worked out how to get MS Diff and Patch to take XML strings- my XML comes from a database and I don't want to have to create files everytime I want to use it... Might just be me being dense. – UglyTeapot May 07 '12 at 13:49
  • You do not need to create files, it comes with many overloads to compare files, XmlTextReader or XmlNode – Adriano Repetti May 07 '12 at 14:01
  • Any attempts at using XmlReaders end up with an error of XmlException: The data at the root level is invalid. Line 1, position 1.', so I'm assuming the serialized objects aren't very valid XML. Hence trying to find a solution that isn't related to MS Patch and Diff! I know its non-trivial, I was hoping putting enough constraints on it would make it easier! Or should I keep fighting the MS tool? – UglyTeapot May 07 '12 at 14:06
  • The fragment from your example is valid XML. If it's not (and it's not something simple to fix) then any comparison may be **very** difficult. I think serialized XML is always valid so it may be the **encoding** you used to save that files for your test. – Adriano Repetti May 07 '12 at 14:13

1 Answers1

11

Using XMlDiff is the way to go - to prove it here's some working code. I'm using your XML. If the XML is different (or invalid), this may not work.

Original:

var xml1 = @"<HotelBookingView xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"">
<Id>119</Id>
<RoomId>1</RoomId>
<ChangeRequested>false</ChangeRequested>
<CourseBookings>      
    <CourseHotelLink>
    <Id>0</Id>
    </CourseHotelLink>
</CourseBookings>
</HotelBookingView>";

Different Id value in CourseBookings:

var xml2 = @"<HotelBookingView xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"">
<Id>119</Id>
<RoomId>1</RoomId>
<ChangeRequested>false</ChangeRequested>
<CourseBookings>      
    <CourseHotelLink>
    <Id>1</Id>
    </CourseHotelLink>
</CourseBookings>
</HotelBookingView>";

Low effort way of creating readers (change to XDocument if needed):

var node1 = XElement.Parse(xml1).CreateReader();
var node2 = XElement.Parse(xml2).CreateReader();

Prepare the result writer:

var result = new XDocument();
var writer = result.CreateWriter();

Do the diff:

var diff = new Microsoft.XmlDiffPatch.XmlDiff();    
diff.Compare(node1, node2, writer);
writer.Flush(); 
writer.Close();

result is now an XDocument that contains a summary of the differences:

<xd:xmldiff version="1.0" srcDocHash="14506386314386767543" options="None" fragments="no" xmlns:xd="http://schemas.microsoft.com/xmltools/2002/xmldiff">
  <xd:node match="1">
    <xd:node match="4">
      <xd:node match="1">
        <xd:node match="1">
          <xd:change match="1">1</xd:change>
        </xd:node>
      </xd:node>
    </xd:node>
  </xd:node>
</xd:xmldiff>
Prisoner ZERO
  • 13,848
  • 21
  • 92
  • 137
yamen
  • 15,390
  • 3
  • 42
  • 52
  • That does work, thank you! So I just need to parse the diffgram to make it be able to display things like 'CourseBookings Id was 0, now 1', which should be fun! – UglyTeapot May 08 '12 at 12:51
  • Don't forget to accept if it answers the original question :-) There's plenty of info on how to interpret the return from `xmldiff`. – yamen May 08 '12 at 20:40
  • Accepted, thanks! And.. there are examples of how to interpret diffgrams to show only the differences? My weak google-fu isn't finding anything – UglyTeapot May 09 '12 at 08:59
  • Try a different question and link here so I can see. Best to frame these individually. – yamen May 09 '12 at 14:12
  • 1
    Continued here: http://stackoverflow.com/questions/10530381/how-to-parse-an-xml-diff-to-show-only-differences – UglyTeapot May 10 '12 at 08:42