(some examples here are PHP but the question is generic)
A XML document, essentially, is a specialization of a MIME's text/plain
, a human-readable string. With DOM we can also normalize the two strings by C14N, etc. So a good start point are the solutions of this question... Is reasonable for similar XML documents: for non-so-similar documents, those tools (as UNIX diff
, etc.) produces ugly and unreadable results.
A simple solution for XML is to compare first its tree, tipically a "table of contents" (ToC) generated by getNodePath
. Example:
<root><h1>hello</h1><p>text1</p><p>text2</p><h1>bye</h1></root>
have the "ToC" of XPath'es
root
root/h1[1]
root/h1[1]/p[1]
root/h1[1]/p[2]
root/h1[2]
So, if the structure differ, we not need to compare all element's contents to show that differ, the ToC's difference show it.
QUESTION: there are some "diff tool" that do this "ToC diff" before the "usual diff"?