First off I haven't used DOMDocument for much, but I will give it a go ( and please read the full post ).
You can use the C14N() method it's not well documented but from my IDE i get this:
/**
* Canonicalize nodes to a string
* @link http://www.php.net/manual/en/domnode.c14n.php
* @param exclusive bool[optional] <p>
* Enable exclusive parsing of only the nodes matched by the provided
* xpath or namespace prefixes.
* </p>
* @param with_comments bool[optional] <p>
* Retain comments in output.
* </p>
* @param xpath array[optional] <p>
* An array of xpaths to filter the nodes by.
* </p>
* @param ns_prefixes array[optional] <p>
* An array of namespace prefixes to filter the nodes by.
* </p>
* @return string canonicalized nodes as a string&return.falseforfailure;
*/
public function C14N ($exclusive = null, $with_comments = null, array $xpath = null, array $ns_prefixes = null) {}
I will simply be taking the example of DOMDocument off of the PHP documentation page for this post.
So that said for my examples I have this obj to start with. ( note where I put the for loop in comments, i'll use this latter for benchmarking ):
$xml = new \DOMDocument( "1.0", "ISO-8859-15" );
$xml_album = $xml->createElement( "Album" );
//---- for( $i=0; $i < 10000; $i++ ){ //for benchmarks I will be adding 30,000 nodes, to get something worth measuring performance on.
// Create some elements.
$xml_track = $xml->createElement( "Track", "The ninth symphony" );
// Set the attributes.
$xml_track->setAttribute( "length", "0:01:15" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );
// Create another element, just to show you can add any (realistic to computer) number of sublevels.
$xml_note = $xml->createElement( "Note", "The last symphony composed by Ludwig van Beethoven." );
// Append the whole bunch.
$xml_track->appendChild( $xml_note );
$xml_album->appendChild( $xml_track );
// Repeat the above with some different values..
$xml_track = $xml->createElement( "Track", "Highway Blues" );
$xml_track->setAttribute( "length", "0:01:33" );
$xml_track->setAttribute( "bitrate", "64kb/s" );
$xml_track->setAttribute( "channels", "2" );
$xml_album->appendChild( $xml_track );
$xml->appendChild( $xml_album );
//----- } //end for loop
// Parse the XML.
print $xml->saveXML();
Or roughly when we encode it with htmlspecialchars, and a bit of tabbing:
<?xml version="1.0" encoding="ISO-8859-15"?>
<Album>
<Track length="0:01:15" bitrate="64kb/s" channels="2">The ninth symphony
<Note>The last symphony composed by Ludwig van Beethoven.</Note>
</Track>
<Track length="0:01:33" bitrate="64kb/s" channels="2">Highway Blues</Track>
</Album>
Good so far now using the ( poorly documented C14N() ) gives us this ( minus the nice indenting, and such ), notice they are almost the same but the order is different and we are minus the encoding bit, so we will not want to compare them against each other:
<Album>
<Track bitrate="64kb/s" channels="2" length="0:01:15">The ninth symphony
<Note>The last symphony composed by Ludwig van Beethoven.</Note>
</Track>
<Track bitrate="64kb/s" channels="2" length="0:01:33">Highway Blues</Track>
</Album>
Now generally this appears to be similar to just the saveXML, but it has a few more options for filtering the output than just saveXML, so I thought I would mention it.
Now I'm not entirely sure why the concern for performance as in my limited testing I took the liberty of looping it 10,000 times for 30,000 nodes ( 20,000 track, 10,000 note nodes and 60,000 attributes ), and even then performance was fairly good giving me these results ( just for the function calls shown below, not generating the DOM contents, as that is a separate concern):
$xml->saveXML();
'elapsedTime' => '0.10 seconds',
'elapsedMemory' => '0.39 KB'
$xml->C14N();
'elapsedTime' => '0.15 seconds',
'elapsedMemory' => '0.3 KB'
/// outputting to the screen should not be tracked - as I show below this will have a slight, but non-zero impact on the performance benchmarks.
echo $xml->saveXML()
'elapsedTime' => '0.16 seconds', //+0.06 seconds
'elapsedMemory' => '0.3 KB'
echo $xml->C14N();
'elapsedTime' => '0.21 seconds', //+0.06 seconds again
'elapsedMemory' => '0.3 KB'
So the performance is slightly less then that of the saveXML, but in both cases I would say for the number of nodes I'm using is very reasonable.
So given we can acceptably use either saveXML or C14N, how can we compare changes to such a large string? We'll as everyone should know, you hash it. Now immediately one will think of md5 but sha1 is actually better here, it gives us a slightly longer hash and the performance difference is negligible. In both cases hashing adds about 1 hundredth of a second, and gives us something easier to look at when comparing, saving in DB etc.
-- as a side note I love hashing, it's like epoxy glue, or duct tape it just works on everything.
So we simply hash that, save it to a variable and compare it all we want:
print md5( $xml->saveXML() );
'19edc177072416b7bbf88ea0a240be73'
'elapsedTime' => '0.11 seconds',
'elapsedMemory' => '0.39 KB'
print sha1( $xml->saveXML() );
'7c644c6e1630ffde15eee64643779e415a1746b7'
'elapsedTime' => '0.11 seconds',
'elapsedMemory' => '0.3 KB'
Now I will probably get knocked for using saveXML() ( and/or C14N() ) but ultimately what it boils down to is this. Even counting the attributes which can be done this way ( just to cover my bases):
$old_xp = new \DOMXpath($xml);
$old_a = $old_xp->evaluate('count( //@* )');
$old_n = $old_xp->evaluate('count( //node() )');
print 'Attributes: '.$old_a.'<br>';
print 'Nodes: '.$old_n.'<br>';
print 'Total: '.($old_a + $old_n).'<br>';
outputs: / 1 iterations ( check against the xml posted above ):
Attributes: 6
Nodes: 7 //expected 4 nodes
Total: 13
outputs: / 10,000 iterations:
'elapsedTime' => '0.02 seconds',
'elapsedMemory' => '0.5 KB'
Attributes: 60000
Nodes: 60001 //expected 30,001 nodes ( +2 tracks, +1 note, node per loop and one album node )?
Total: 120001
As you see time is faster, but because we are instantiating DOMXpath here, which might not affect you if you already have an instance available, memory consumption is almost double.
-- as a side note it appears that $old_xp->evaluate('count( //node() )')
is giving strange counts for the nodes I expected 4 nodes and got 7, like it counts both the open and close tags and the encoding tag, or counts the nesting for each child node ( checked this by adding a note node on the second track, which had none, and the count indeed went up by 2 ) any more information on this would be helpful.
Anyway you know the rest for this method.
However, when using the counts, if you were to remove 1 attribute and add another it will incorrectly count the attributes, the same applies for the nodes ( baring it's weird counting ).
But, ultimately there is no way to know if it changed, without looking at the actual data, what if the contents of a node change? etc...
And that ( just counting ) may be good enough for your needs
The choice is yours and it really depends on what level of detail you need, and how much of a performance loss your willing to take for that level of detail.
I suggest thoroughly benchmarking each step and then deciding which is more acceptable for your needs.
Lastly just generating the XML gave me the following time/memory usage ( remember saving and hashing was only 0.11 seconds ):
'elapsedTime' => '21.16 seconds',
'elapsedMemory' => '0.61 KB'
We talk about performance here but no numbers were given, so we really need to put things into context when we make decisions based on performance.
Thanks,