3

The file which I'm trying to download from my PHP script is this one:

http://www.navarra.es/appsext/DescargarFichero/default.aspx?codigoAcceso=OpenData&fichero=Farmacias/Farmacias.xml 

But I can't do it using neither file_get_contents() nor cURL. I'm getting the error Object reference not set to an instance of an object.

Any idea how to do it?

Thanks a lot, Pablo.

Updated to add the code:

$url = "http://www.navarra.es/appsext/DescargarFichero/default.aspx?codigoAcceso=OpenData&fichero=Farmacias/Farmacias.xml";
$simple = simplexml_load_file(file_get_contents($url));
foreach ($simple->farmacia as $farmacia)
{
    var_dump($farmacia);
}

And the solution thanks to @Gordon:

$url = "http://www.navarra.es/appsext/DescargarFichero/default.aspx?codigoAcceso=OpenData&fichero=Farmacias/Farmacias.xml";
$file = file_get_contents($url, FALSE, stream_context_create(array('http' => array('user_agent' => 'php' ))));
$simple = simplexml_load_string($file);
Puigcerber
  • 9,814
  • 6
  • 40
  • 51

2 Answers2

5

You dont need cURL, nor file_get_contents to load XML into any of PHP's DOM Based XML parsers.

However, in your particular case, the issue seems to be that the server expects a user agent in the http request. If the user agent is not set in your php.ini, you can use the libxml functions and provide it as a stream context:

libxml_set_streams_context(
    stream_context_create(
        array(
            'http' => array(
                'user_agent' => 'php'            
            )
        )
    )
);

$dom = new DOMDocument;
$dom->load('http://www.navarra.es/app…/Farmacias.xml');
echo $dom->saveXml();

Live Demo

If you dont want to parse the XML file afterwards, you can use file_get_contents as well. You can pass the stream context as the third argument:

echo file_get_contents(
    'http://www.navarra.es/apps…/Farmacias.xml',
    FALSE,
    stream_context_create(
        array(
            'http' => array(
                'user_agent' => 'php'            
            )
        )
    )
);

Live Demo

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • 1
    .. or you could set the user agent with `curl_setopt($ch, CURLOPT_USERAGENT, "Googlebot/2.1...");` – pleasedontbelong May 12 '11 at 08:50
  • @please yes, that would work, too. But tbh, I find cURL completely overkill for this. – Gordon May 12 '11 at 08:58
  • Thanks @Gordon, it works perfectly. By your answer I understand that you recommend me to use DOM or XMLReader instead of SimpleXML? – Puigcerber May 12 '11 at 12:51
  • @Puigcerber I prefer DOM over SimpleXML for the reasons mentioned in http://stackoverflow.com/questions/4803063/what-the-difference-between-phps-dom-and-simplexml-extensions/4803264#4803264 and use SimpleXml only for very simple XML files. DOM offers more control. XmlReader is a different kind of parser for different UseCases, namely huge files and memory limited environments. If you can find the time, try out all of them and see which you like best. – Gordon May 12 '11 at 13:22
  • Thanks @Gordon, I will have a look to it. – Puigcerber May 13 '11 at 09:38
  • Hey @Gordon, could you have a look to my new answer? Thanks a lot. – Puigcerber May 26 '11 at 16:46
  • @pleasedontbelong do I have to use it like you said or like this `curl_setopt($ch, CURLOPT_USERAGENT, "Googlebot/2.1 (http://www.googlebot.com/bot.html)");`? Thanks. – Puigcerber May 27 '11 at 16:53
  • 1
    @Puigcerber mine was just an exemple =P you can set the user agent to any agent that you want, it could be a google bot, or a mozilla agent =) cheers! – pleasedontbelong May 28 '11 at 11:31
  • 1
    @Puigcerber the user agent can be anything that identifies the requesting side. It doesnt have to be a browser. See my answer to http://stackoverflow.com/questions/6002513/i-need-to-write-a-web-crawler-for-specific-user-agent-please-help/6002719#6002719 and/or RFC1945 and/or http://www.useragentstring.com/pages/PHP/ – Gordon May 28 '11 at 12:11
0

I have been using the solution given for @Gordon and it was working perfectly in localhost:

$url = "http://www.navarra.es/appsext/DescargarFichero/default.aspx?codigoAcceso=OpenData&fichero=Farmacias/Farmacias.xml";
$file = file_get_contents($url, FALSE, stream_context_create(array('http' =>array('user_agent' => 'php' ))));
$simple = simplexml_load_string($file);

But when I have uploaded all the files to the server... surprise, as always. I started to get the error URL file-access is disabled in the server configuration in so I have changed all the file_get_contents() for this code which I have found here:

function get_content($url)
{
$ch = curl_init();

curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, "Googlebot/2.1...");

ob_start();

curl_exec ($ch);
curl_close ($ch);
$string = ob_get_contents();

ob_end_clean();

return $string;
}

Would you think is it a good approach?

Thanks, Pablo.

Puigcerber
  • 9,814
  • 6
  • 40
  • 51
  • 1
    if you do `curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);` you dont need to buffer the output. Just do `$content = curl_exec()`. – Gordon May 26 '11 at 17:32