0

I'm using SimpleXML to fetch a remote XML file and im having some issues because sometimes SimpleXML can't load the XML. I don't know exactly the reason but i suspect the remote site takes longer than usual to return data, resulting in a timeout.

The code i use is the following:

$xml = @simplexml_load_file($url);  

    if(!$xml){      

        $database = Config_helper::get_config_option('mysql');     
        $db = new \DB($database['database'], $database['server'], $database['user'], $database['password']);
        $date = date('Y-m-d H:i:s');

        $db->query("INSERT INTO gearman_job_error (timestamp, data, attempt)
            VALUES ('$date', '{$job->workload()}', '1')");

        //$db->query("INSERT INTO gearman_job_error (timestamp, data, attempt) VALUES ({$date}, {$job->workload()}, 1);");

        return $job->sendFail();                        
    }
    else {      

        foreach($xml->point as $key=>$value):

            $length = count($value);        
            $timestamp = (string) $value->data[0];              

            $j=0;

            for ($i = 1; $i < $length; $i++) 
            {                               
                $forecast[$timestamp][$time_request][] = array($variables[$j] => (string) $value->data[$i]);                        
                $j++;
            }               

        endforeach;                                                         

        return serialize($forecast);            
    }

Those url's i can't load are stored in the database and by checking them i confirm that they load correctly in the browser.. no problem with them.

Example: http://mandeo.meteogalicia.es/thredds/ncss/modelos/WRF_HIST/d02/2015/02/wrf_arw_det_history_d02_20150211_0000.nc4?latitude=40.393288&longitude=-8.873433&var=rh%2Ctemp%2Cswflx%2Ccfh%2Ccfl%2Ccfm%2Ccft&point=true&accept=xml&time_start=2015-02-11T00%3A00Z&time_end=2015-02-14T20%3A00Z

My question is, how can i insist the SimpleXML to take it's time to load the url? My goal is only after a reasonable time it assumes it can't load the file and store it in the database.

Andre Garcia
  • 894
  • 11
  • 30

2 Answers2

0

simplexml_load_file itself doesn't have any support for specifying timeouts, but you can combine file_get_contents and simplexml_load_string, like this:

<?php
$timeout = 30;
$url = 'http://...';

$context = stream_context_create(['http' => ['timeout' => $timeout]]);

$data = file_get_contents($url, false, $context);

$xml = simplexml_load_string($data);

print_r($xml);
iainn
  • 16,826
  • 9
  • 33
  • 40
  • I implemented this solution with a timeout of 120(!) and i think it has improved a lot, although i'm still getting a few url's in the database. I must say i am retrieving 400 remote xml's or more with the assistance of gearman workers. Ultimately i can test increasing the timeout even more but i'm still not very confident. :( Suggestions? Maybe cURL? – Andre Garcia Jul 21 '16 at 16:31
0

I figured a way of doing this that for now suits me.

I set a maximum number of tries to fetch the xml and if it doesn't work that means the xml can be possibly damaged or missing.

I have tested and the results are accurate! It's simple and more effective then setting a timeout. I guess you can always set a timeout also.

$maxTries = 5;

do
{     
  $content = @file_get_contents($url);
}
while(!$content && --$maxTries);

if($content)
{
    try
    {
        $xml = @simplexml_load_string($content);        
        # Do what you have to do here #
    }
    catch(Exception $exception)
    {           
        print($exception->getMessage());
    }
}
else
{
    echo $url;      
    $job->sendFail();                       
}
Andre Garcia
  • 894
  • 11
  • 30