0

I want to parse some XML rss feeds that I got from curl with XMLReader and SimpleXML for "faster" reason.

However it can't be parse to xml due to the result of curl is string:

$element = new SimpleXMLElement($xml->readOuterXML()); //String could not be parsed as XML

Here is my code:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, get_post_meta($post_id->ID, 'feed', true)); //https://wordpress.org/news/feed/
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_ENCODING , 'gzip, deflate');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$rss = curl_exec($ch);
curl_close($ch);
$xml->xml($rss);

while ($xml->read()){
    if ($xml->nodeType == XMLReader::ELEMENT){
        $element = new SimpleXMLElement($xml->readOuterXML()); //String could not be parsed as XML

        foreach ($element as $channel){
        foreach ($channel->item as $item){
                //Loop Process
        }
        }

Am I missing something or wrong at some point?

niznet
  • 107
  • 1
  • 11
  • A couple of things I'm not sure about - if you are reading the whole XML source in one go, would it be better using SimpleXML directly rather than using XMLReader? Also have a look at https://stackoverflow.com/questions/1835177/how-to-use-xmlreader-in-php which gives a simple XMLReader example if you need one to follow. – Nigel Ren Oct 06 '19 at 14:03
  • someone said to me that use xmlreader(combine) would be faster than just use simplexml only, that why I tried it. – niznet Oct 06 '19 at 14:06
  • XMLReader only really helps with large documents, in this case mixing the two can cause (IMHO) more work. I would be interested to see on what basis they say it would be faster - always happy to learn, but in this instance, I would try SimpleXML on it's own first. – Nigel Ren Oct 06 '19 at 14:10
  • I see, I should have just with SimpleXML only if it just to parsing a website feeds. thank you, it frustrating to revert back to the origin but glad to learn it. – niznet Oct 06 '19 at 14:21
  • It's hard to say from your question why you get that error. Most likely the XML you downloaded is invalid. – hakre Oct 12 '19 at 14:01

1 Answers1

0

The error you see basically means that the string returned by $xml->readOuterXml() is invalid XML.

It is likely that you see the error at that line first because XMLReader starts to read the XML w/o validating the document - it's not an XML parser but an XML reader.

However SimpleXMLElements' constructor needs a valid XML string. So that is the first place of validating the XML.

Apart from that I have to say that I could not reproduce the error you have with the wordpress.org feeds example URL. For me your code worked w/o error.

Additionally in the example the SimpleXMLElement is created out of the first element the reader offers which is the document element - the whole XML file. For that you really don't need XMLReader, instead just create the element:

<?php
$ch = curl_init('https://wordpress.org/news/feed/');
# ...
$rss = curl_exec($ch);
curl_close($ch);
$xml = new SimpleXMLElement($rss);
foreach ($xml->channel as $channel) {
    # ...
}

Otherwise if you would like to use the XMLReader regardless, something along the lines would actually move the XMLReader from channel element to channel element:

<?php
$ch = curl_init('https://wordpress.org/news/feed/');
# ...
$rss = curl_exec($ch);
curl_close($ch);
$xml = XMLReader::xml($rss);
if ($xml->next() && $xml->read()) { # in document element
    while ($xml->next('channel')) {
        $channel = new SimpleXMLElement($xml->readOuterXml());
        # ...
    }
}

Both examples run error free for me.

With these two examples at hand you can also easily verify the "faster" promise.

hakre
  • 193,403
  • 52
  • 435
  • 836