0

I'm trying to parse an RSS feed and I am getting what appears to be an empty DOM Document object. My current code is:

$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

    $curl = curl_init();
    curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt( $curl, CURLOPT_URL, $xml_url );

    $xml = curl_exec( $curl );
    curl_close( $curl );

    //$xml = iconv('UTF-8', 'UTF-8//IGNORE', $xml);
    //$xml = utf8_encode($xml);
    $document = new DOMDocument;
    $document->loadXML( $xml ); 
    if( ini_get('allow_url_fopen') ) {
      echo "allow url fopen? Yes";
    }
    echo "<br />";
    var_dump($document);

    $items = $document->getElementsByTagName("item");

    foreach ($items as $item) {
        $title = $item->getElementsByTagName('title');
        echo $title;
    }

    $url = 'https://thehockeywriters.com/category/san-jose-sharks/feed/';
    $xml = simplexml_load_file($url);
    foreach ($items as $item) {
        $title = $item->title;
        echo $title;
    }
    print_r($xml);
    echo "<br />";
    var_dump($xml);
    echo "<br />hello?";

This code is two separate attempts at parsing the same url based on answers and suggestions given in the following examples found on stack overflow:
Example 1
Example 2

Things I have tried or looked up:
1. Checked to make sure that allow_url_fopen is allowed
2. Made sure that there is UTF encoding
3. Validated the XML
4. Code examples provided on previously linked Stack Overflow posts

Here is my current output with the var_dumps and echo's

allow url fopen? Yes
object(DOMDocument)#2 (34) { ["doctype"]=> NULL ["implementation"]=> string(22) "(object value omitted)" 
["documentElement"]=> NULL ["actualEncoding"]=> NULL ["encoding"]=> NULL 
["xmlEncoding"]=> NULL ["standalone"]=> bool(true) ["xmlStandalone"]=> bool(true) 
["version"]=> string(3) "1.0" ["xmlVersion"]=> string(3) "1.0" 
["strictErrorChecking"]=> bool(true) ["documentURI"]=> NULL ["config"]=> NULL 
["formatOutput"]=> bool(false) ["validateOnParse"]=> bool(false) ["resolveExternals"]=> bool(false) 
["preserveWhiteSpace"]=> bool(true) ["recover"]=> bool(false) ["substituteEntities"]=> bool(false) 
["nodeName"]=> string(9) "#document" ["nodeValue"]=> NULL ["nodeType"]=> int(9) ["parentNode"]=> NULL 
["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> NULL ["lastChild"]=> NULL 
["previousSibling"]=> NULL ["attributes"]=> NULL ["ownerDocument"]=> NULL ["namespaceURI"]=> NULL 
["prefix"]=> string(0) "" ["localName"]=> NULL ["baseURI"]=> NULL ["textContent"]=> string(0) "" } 
bool(false) 
hello?
Kurt Leadley
  • 513
  • 3
  • 20
  • Neither of the answers you've looked at previously are using SSL. Take a look at https://stackoverflow.com/questions/4372710/php-curl-https I think the issue is the certificate. – user3783243 Mar 28 '19 at 03:02
  • Hmm, I tried the quickfix `curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, false);` just to see if it would work and it did not. Plus, I guess that is a security issue as well. – Kurt Leadley Mar 28 '19 at 03:09

2 Answers2

1

The only issue I had with your code was that not defining a user-agent would give me error 403 to access the feed.

In the future, you could use curl_getinfo to extract the status code of the request to ensure it didn't failed and further match it against code 200, which means OK.

$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);

Aside from that a few mistakes within your loops.

With SimpleXML:

<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

if ($httpcode !== 200)
{
    echo "Failed to retrieve feed... Error code: $httpcode";
    die();
}

$feed = new SimpleXMLElement($data);
// list all titles...
foreach ($feed->channel->item as $item)
{
    echo $item->title, "<br>\n";
}

With DOMDocument:

<?php
$url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_URL, $url);
$data = curl_exec($curl);
$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);

if ($httpcode !== 200)
{
    echo "Failed to retrieve feed... Error code: $httpcode";
    die();
}

$xml = new DOMDocument();
$xml->loadXML($data);
// list all titles...
foreach ($xml->getElementsByTagName("item") as $item)
{
    foreach ($item->getElementsByTagName("title") as $title)
    {
        echo $title->nodeValue, "<br>\n";
    }
}

If you just want to print the title/description of all items:

foreach ($feed->channel->item as $item)
{
    echo $item->title;
    echo $item->description;
    // uncomment the below line to print only the first entry.
    // break;
}

If you want just the first entry, without using a foreach:

echo $feed->channel->item[0]->title;
echo $feed->channel->item[0]->description;

Saving title and description to an array for later using it:

$result = [];
foreach ($feed->channel->item as $item)
{
    $result[] = 
    [
        'title' => (string)$item->title,
        'description' => (string)$item->description
    ];
    // could make a key => value alternatively from the above with 
    // title as key like this: 
    // $result[(string)$item->title] = (string)$item->description;
}

Foreach with MySQLi/PDO prepared statement:

foreach ($feed->channel->item as $item)
{
    // MySQLi
    $stmt->bind_param('ss', $item->title, $item->description);
    $stmt->execute();
    // PDO
    //$stmt->bindParam(':title', $item->title, PDO::PARAM_STR);
    //$stmt->bindParam(':description', $item->description, PDO::PARAM_STR);
    //$stmt->execute();
}
Prix
  • 19,417
  • 15
  • 73
  • 132
  • 1
    As soon as I added that useragent line, I was able to get it to work. Thank you. Fixed the loop too, like you mentioned. – Kurt Leadley Mar 28 '19 at 03:35
  • @KurtLeadley you could further use `$httpcode = curl_getinfo($curl, CURLINFO_HTTP_CODE);` to verify the code is 200 to ensure you got the data as well, see updated code. – Prix Mar 28 '19 at 03:38
  • Ahh, very nice. I need to do more research on the curl options. – Kurt Leadley Mar 28 '19 at 03:58
1

I selected Prix's answer for pointing out the user agent definition, but I came up with another way of doing the loop that avoids nested loops and makes it easier to access other nodes. Here is what I am using (DOM Document solution):

$xml_url = "https://thehockeywriters.com/category/san-jose-sharks/feed/";

$curl = curl_init();
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt( $curl, CURLOPT_URL, $xml_url );
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0");

$xml = curl_exec( $curl );
curl_close( $curl );

$document = new DOMDocument;
$document->loadXML( $xml ); 

$items = $document->getElementsByTagName("item");       
foreach ($items as $item) {     
    $title = $item->getElementsByTagName('title')->item(0)->nodeValue;
    echo $title;
    $desc = $item->getElementsByTagName('description')->item(0)->nodeValue;
    echo $desc;
}
Kurt Leadley
  • 513
  • 3
  • 20
  • I still prefer SimpleXML, it feels to me more straight forward to use, I've added 3 other examples to show you that. – Prix Mar 28 '19 at 04:26
  • I see! This is great. I have options next time. I happened to have a working version of code similar to what I just posted myself, so I went with that. I'm interested in trying the array one. Could the array solution reduce SQL insert query's? Right now I insert to my db per loop. – Kurt Leadley Mar 28 '19 at 04:32
  • If you're using a framework like codeigniter you could use it for a bulk insert but it would be pretty much a loop behind the scenes. Just make sure you're using prepared statements to bind all that data within your foreach to avoid headaches later. – Prix Mar 28 '19 at 04:34
  • Added an example at the bottom of what it would look like using MySQLi bind_param with the foreach just in case ;) – Prix Mar 28 '19 at 04:40
  • 1
    Thanks, I've been yelled at enough to know better and always use prepared statements haha. Already have the March articles pulling from my websites db : ) http://sjsharktank.com/index.php – Kurt Leadley Mar 28 '19 at 04:42