1

(some examples here are PHP but the question is generic)

A XML document, essentially, is a specialization of a MIME's text/plain, a human-readable string. With DOM we can also normalize the two strings by C14N, etc. So a good start point are the solutions of this question... Is reasonable for similar XML documents: for non-so-similar documents, those tools (as UNIX diff, etc.) produces ugly and unreadable results.

A simple solution for XML is to compare first its tree, tipically a "table of contents" (ToC) generated by getNodePath. Example:

<root><h1>hello</h1><p>text1</p><p>text2</p><h1>bye</h1></root>

have the "ToC" of XPath'es

root
root/h1[1]
root/h1[1]/p[1]
root/h1[1]/p[2]
root/h1[2]

So, if the structure differ, we not need to compare all element's contents to show that differ, the ToC's difference show it.

QUESTION: there are some "diff tool" that do this "ToC diff" before the "usual diff"?

Community
  • 1
  • 1
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • 1
    I think the following can help you : "SIMPLE FAST ALGORITHMS FOR THE EDITING DISTANCE BETWEEN TREES AND RELATED PROBLEMS", 1989 in 'Society for Industrial and Applied Mathematics' KAIZHONG ZHANG and DENNIS SHASHA. You can also look here for a tool : http://diffxml.sourceforge.net/ – Galigator Oct 02 '14 at 12:35
  • The [1989's article](http://www.grantjenks.com/wiki/_media/ideas/simple_fast_algorithms_for_the_editing_distance_between_tree_and_related_problems.pdf) is a good starting point to study the "theory of diff" of "ordered labeled trees"... My focus will be the [`diffxml`](http://diffxml.sourceforge.net/) and [comments about using it](http://www.adrianmouat.com/bit-bucket/2009/05/why-use-diffxml/)... It seems very good, thanks! – Peter Krauss Oct 02 '14 at 12:53
  • Another simple solution is to use the [usual diff-text solutions](http://stackoverflow.com/q/321294/287948) with the "ToC" as string, and offer a interface to navigate in the "ToC" by intra-element (XML as usual text) differences... There are a web-interface (ex. PHP) like that? [illustrating](http://twinforms.com/products/xmldiff/images/xmldiff_main.png)... – Peter Krauss Oct 02 '14 at 13:32

0 Answers0