Reading XML file in parts

Question

I am trying to read a XML file from the URL, with the help of XMLReader Iterators https://gist.github.com/hakre/5147685

$reader = new XMLReader();
$reader->open($filename);


$element = new XMLReaderNode($reader);
$it = new XMLElementIterator($reader, 'coupon');
$data = array();

$i = 0;
foreach($it as $index => $element) {
    if( $i == 0 ) {
       $xml = $element->asSimpleXML();
       //print_r($xml->children());
       foreach( $xml as $k=>$v ) {
           $data[0][strtolower("{$k}")] = "{$v}";
       }
    }// End IF
}
print_r($data);

Its working fine with the small file, but its taking long time to read xml file from url.

Can i first download the file from url then READ it?
Is it the right way that i am doing?
Is there any other alternative?

You want to download the XML file from an URL and then use XMLReader to iterate over it right? Because your title kind of states something different while that is what your text asks for. — Andresch Serj, May 06 '14 at 11:37
Regexp works much faster sometimes. One time I used it for importing 65,000 products into my application. I used Regexp and worked fine. — Mohebifar, May 06 '14 at 11:38
@Mohebifar 1. The Botteleneck seems to be downloading the XML File, not extracting the Information from XML 2. It doesn't sound very clever to reinvent the square wheel here now does it? — Andresch Serj, May 06 '14 at 11:39
1. He is talking about the slowness of parsing xml. I offered him a way. 2. Maybe this wheel is not optimized for him. \XMLReader Makes a full hierarchical tree from the document. sometimes this is not necessary for user and it has extra overload. for example if he is looking for just ``s it takes more time to parse the entire document which contains `` and extra tags like ``, ``, ``, ... — Mohebifar, May 06 '14 at 11:45
@AndreschSerj, right now i am fetching the xml data from the URL, not yet downloading the file. — jogesh_pi, May 06 '14 at 12:00
@Mohebifar 1. Not sure about that. 2. I assume this is not for a one-time thing since the downloading/processing time normaly isn't an issue with one-time tasks. For regular tasks, regexing an XML document instead of using a proper XML Parser will most likely lead to errors and problems don't you think? — Andresch Serj, May 06 '14 at 12:02

score 0 · Answer 1 · edited May 23 '17 at 12:29

If I understand your question right, it just takes long to download the large file all the time.

But you can just cache the file locally, by first download the XML from an http-URI and then store it to disk.

This is very useful when you develop your software because otherwise doing the remote-request all the time to fetch the XML is a needless overhead and I assume the data is not that fresh that it changes for each of your parsing tests and you would require those changes in the XML.

I suggest to do something along the lines from the answer of Download File to server from URL:

$filename  = "http://someurl/file.xml";
$cachefile = "file.xml";

if (!is_readable($cachefile) 
{
    file_put_contents($cachefile, fopen($filename, 'r'));
}

$reader = new XMLReader();
$reader->open($cachefile);

This little example will create the $cachefile in case it does not exists. In case it does exists, it will not download again.

So this will only take once longer to load that file. You can also first download the XML file if it is really large with a HTTP client that supports resume (partial transfers) like the wget or the curl command-line utilities because if in case something goes wrong with the transfer, you don't have to download the whole file again.

You then just operate on your local copy. You wouldn't need to change your code then at all, just $filename would point to the local file instead.

Reading XML file in parts

1 Answers1