Compare XML in PHP without hashing

Question

I am reading an XML feed and would like to compare to an old version to check for updates.

My problems at the moment are that I can't seem to make a copy of a SimpleXML object, and the other problem is I'm not sure I can directly compare them.

This is my code as it stands. I'm obviously just testing on local files, but I intend to eventually load from the web.

Is it okay to use sleep for very long periods? I was thinking 15 minute interval is often enough for my purpose.

error_reporting(E_NOTICE);
$file = 'tmbdata_sm.xml';

$xml_old = "";
while(true){
 $xml = simplexml_load_file($file);

 if($xml != $xml_old){
    foreach($xml->channel->item as $item){
        echo $item->title . "\n";
        echo $item->link . "\n";
    }
    $xml_old = clone $xml;
    $xml = "";
 }else{
    echo 'no change';
 }

sleep(60);
}

score 1 · Answer 1 · answered Dec 23 '09 at 17:45

1

Without a definition of what "updated" means in your context I'm afraid your question may remain unanswered. String compare might work, but a better and faster way would be to use filemtime() which lets you know the last time that the file was modified.

Also you should refrain from using sleep() in an infinite loop like you are doing. I don't think having PHP running indefinitely will be healthy for your computer or your server. The proper way to do this is either a cronjob when using UNIX, or the task scheduler when in Windows.

answered Dec 23 '09 at 17:45

tedeh

368
2
8

filemtime() would be an elegant solution but as I want to read the xml from http I don't think it will work (although I will try it). Can you provide an explanation, or link, as to why using sleep in an infinite loop is a bad thing? I'm not disagreeing with you, I'd just like to know and couldn't find any info (and I mean running php from the cmd line, obviously an infinite loop for a script on the web is not a good thing). My host (nfs) currently doesn't allow cron jobs and I can see on my local machine that php uses neither cpu nor ram using sleep(5*60) – aland Dec 24 '09 at 13:26
perhaps checking out http headers? i'm not sure – Pedro Mar 17 '11 at 17:21

score 0 · Accepted Answer · edited Nov 13 '11 at 21:25

0

I think you can't compare simple xml objects in this way.

I would try to download the xml using whatever you feel comfortable with (say, cURL extension), then compare the xml text strings, and then when you find they are different, use simplexml_load_string() to parse the xml text.

edited Nov 13 '11 at 21:25

answered Dec 20 '09 at 19:21

Roland Bouman

31,125
6
66
67

1

Or even just use `file_get_contents()` on the URL, as that's the equivalent to `simplexml_load_file()`. Otherwise, I was about to give the same answer: just compare them as strings. – Josh Davis Dec 21 '09 at 03:26
what if the order of xml elements in the latest doc changes? Text comparison will tell that the documents are different even though they are have the same semantics. IMO you need to sort the downloaded version before doing any comparisons. Take a look at http://stackoverflow.com/questions/2788404/sort-xml-nodes-with-php – xvga Dec 20 '11 at 09:03
@xvga you're complicating things without good reason. Without detailed knowledge of the vocabulary you can't just assume the element order is or isn't important for the semantics. Besides, the application could still be interested in the change even though the document may be semantically equivalent. (Think for example of changes in comments) – Roland Bouman Jan 22 '12 at 10:24
@RolandBouman well, that happened in my situation: order didn't matter for the semantics – xvga Jan 22 '12 at 14:15
@xvga Can we agree that in the generic case, and without any further specification of the requirements, it is not obvious that the elements should be sorted, and if so, how they should be sorted? – Roland Bouman Jan 26 '12 at 23:54

Compare XML in PHP without hashing

2 Answers2