0


I'm using a simple PHP script on Altervista provider to get data from a very big HTML table (more than 6300 rows) at this link.

The problem is the "Maximum execution time of 30 seconds exceeded" during the rows loop.
I'd like to get XML data or even plain CSV text data, is there a faster way instead of looping each row?

<?php  
set_time_limit(3000);
ini_set('max_execution_time', 3000);
function XML_Append($XML,$Q,$Sex,$TabCnt,$TabName) {
$pagecontent = file_get_contents($Q);
echo "DONE fetch";

$doc = new DOMDocument();
$doc->preserveWhiteSpace = false;
$doc->loadHTML($pagecontent);
$tables = $doc->getElementsByTagName('table');
$rows = $tables->item($TabCnt)->getElementsByTagName('tr');
$rowLen=$rows->length;
echo $rowLen;
for($ir = 0; $ir < $rowLen; ++$ir) {
    echo $ir . "\r\n";
    $row=$rows[$ir];
}
unset($doc);
}
$QUERY_SPRINT_FEMMINILE="http://risultati.fitri.it/rank.asp?Anno=%ANNO%&TRank=S&Ss=F&PunDal=0.00&PunAl=999.99";
$QUERY_SPRINT_MASCHILE="http://risultati.fitri.it/rank.asp?Anno=%ANNO%&TRank=S&Ss=M&PunDal=0.00&PunAl=999.99";
$QUERY="";
$ANNO="";
if (isset($_GET['Anno'])) {
    $ANNO= $_GET['Anno'];
} else {
    $ANNO="2019";
}
$QUERY=str_replace("%ANNO%",$ANNO,$QUERY_SPRINT_MASCHILE);
$xml = new SimpleXMLElement('<DocumentElement/>');
XML_Append($xml,$QUERY,"M",1,"SP");
echo "DONE";
?>

the loop code is:

  foreach ($rows as $row)
    {
    $xmlTable = $XML->addChild($TabName);
    $xmlTable->addChild('_S', $Sex);    
    $cols = $row->getElementsByTagName('td');
    $colLen=$cols->length;

        for($i = 0; $i < $colLen; ++$i) {
            $NomeColonna="C" . $i;
            $value= $cols->item($i)->nodeValue;
            $value=trim(str_replace(PHP_EOL, "", $value));
            $value=str_replace("\xc2\xa0","",$value); 
            $xmlTable->addChild($NomeColonna,$value);
        }
    }
d.mercanti
  • 11
  • 5
  • What operations do you perform on each row? If none, maybe you can convert the `$rows` array directly to [XML data](https://stackoverflow.com/questions/1397036/how-to-convert-array-to-simplexml) or a [CSV file](https://stackoverflow.com/questions/13108157/php-array-to-csv). – showdev Sep 27 '19 at 00:18
  • It might not be a beautiful solution, but you can parse that file using regular expressions. And write to output XML file without creating any objects. Of course, using DOMDocument and SimpleXMLElement do the code easier but working with big files they spend more time and require more memory. – Vitaly Sep 27 '19 at 05:18
  • I'm going to try with simple CSV but the problem seems to be the use of "complex" object exposed by DOMDocument ... – d.mercanti Sep 27 '19 at 07:26
  • The HTML you're looking to process is awful, it looks like a distracted child wrote it in 1999. Literally every row has multiple instances of invalid HTML. This will have a lot to do with how long your script takes to process it, and there's not much you can do about it, unfortunately. I think even cheating and using a regex would be very slow. You could maybe contact the site owners and see if they can output a different format, but I think your best bet is increasing the timeout and waiting for it to finish. – miken32 Sep 27 '19 at 16:24
  • miken32 I agree with you. I've already contacted the owners for an API or Webservice but... no way... for now... so I think I have no solution because I can't change the timeout on my hosting plan. Not even 000webhostapp seems to work. – d.mercanti Sep 27 '19 at 18:48
  • ... I tried the script on my PC in the command line version of php and it runs in about 1 minute, not even 000webhost that seems to have no timeout problems does not complete the execution ... – d.mercanti Sep 28 '19 at 06:02

0 Answers0