One problem with this idea is that xpath "destroys" information.
There are an infinite number of xml trees that can match many xpaths. Now in some cases, like the example you give, there is an obvious minimal xml tree which matches your xpath, where you have a predicate that uses "=".
But for example if the predicate uses not equal, or any other arithmetic operator other than equal, an infinite number of possibilities exist. You could try to choose a "canonical" xml tree which requires, say, the fewest bits to represent.
Suppose for example you had xpath /feed/entry/content[@source > 0]
. Now any xml tree of the appropriate structure in which node content had an attribute source whose value was > 0 would match, but there are an infinite number of numbers greater than zero. By choosing the "minimal" value, presumably 1, you could attempt to canonicalize your xml.
Xpath predicates can contain pretty arbitrary arithmetic expressions, so the general solution to this is quite difficult, if not impossible. You could imagine a huge equation in there, and it would have to be solved in reverse to come up with values that would match the equation; but since there can be an infinite number of matching values (as long as it's really an inequality not an equation), a canonical solution would need to be found.
Many expressions of other forms also destroy information. For example, an operator like "or" always destroys information. If you know that (X or Y) == 1
, you don't know if X is 1, Y is 1, or both of them is 1; all you know for sure is that one of them is 1! Therefore if you have an expression using OR, you cannot tell which of the nodes or values that are inputs to the OR should be 1 (you can make an arbitrary choice and set both 1, as that will satisfy the expression for sure, as will the two choices in which only one of them is 1).
Now suppose there are several expressions in the xpath which refer to the same set of values. You then end up with a system of simultaneous equations or inequalities that can be virtually impossible to solve. Again, if you restrict the allowable xpath to a small subset of its full power, you can solve this problem. I suspect the fully general case is similar to the Turing halting problem, however; in this case, given an arbitrary program (the xpath), figure out a set of consistent data that matches the program, and is in some sense minimal.