1

I need to run a daily cron job that iterates over a 6 MB CSV file to insert each of the ~10,000 entries into a MySQL table. The code I have written hangs and produces a timeout after a while.

if (($handle = fopen($localCSV, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $dbdata = array(
            'SiteID' => $siteID,
            'TimeStamp' => $data[0],
            'ProductID' => $data[1],
            'CoordX' => $data[2],
            'CoordY' => $data[3]
        );  
        $row++;
        $STH = $DBH->prepare("INSERT INTO temp_csv (SiteID,TimeStamp,ProductID,CoordX,CoordY) VALUES (:SiteID,:TimeStamp,:ProductID,:CoordX,:CoordY)");
        $STH->execute($dbdata);
    }
    fclose($handle);
    echo $row." rows inserted.";
}

It would have been ideal to use mysql_* functions instead of PDO, so I could implode the values into one single query (although huge) but unfortunately I need to comply with some guidelines (PDO to be strictly used).

I searched SO and there are very similar questions but none could solve mine. What I tried is the following:

1- Ran LOAD DATA INFILE and LOAD DATA LOCAL INFILE queries but kept getting "file not found" errors although the file is definitely there with 777 permissions. The DB server and the shared hosting account are in different environments. I tried relative and url paths to the csv file but no luck (couldn't find the file in both cases).

2- I split the csv file into 2 files and ran the script on each, to see the threshold at which the script hangs, but it inserted the entries twice in the table in the case of each file.

I don't have access to php.ini since it's a shared hosting account (cloudsites) and only access to MySQL through phpMyAdmin.

What else can I try to accomplish this as efficiently as possible?

Any help is appreciated.

웃웃웃웃웃
  • 11,829
  • 15
  • 59
  • 91
Nino Kay
  • 21
  • 1
  • 4

1 Answers1

0

The code looks not wrong to me. It hangs because it just takes a while to execute. You should use phps set_time_limit to prevent timeouts.

if (($handle = fopen($localCSV, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
    set_time_limit(30) // choose a value that works for you
    // ... the rest of your script

Better, however would be to start a background-process where the csv is processed, it would need some sort of locking, so it doesn't run in multiple instances in parallel. If you'd write the status into a file on disk you could present it easily to your users. The same applies for a cron script (if you can do that with your hosting solution)

The use of PDO looks ok to me. I wouldn't think of inserting all rows of the csv at once, but you could insert multiple rows at once with PDO, too. Create the statement and the data array for multiple rows. It could look like this rough sketch (I did not execute it so there will probably be some errors):

function insert_data($DBH, array $dbdata, array $values) {
    $sql = "INSERT INTO temp_csv (SiteID,TimeStamp,ProductID,CoordX,CoordY) VALUES %1$s;";
    $STH = $DBH->prepare(sprintf($sql, join(', ', $values)));
    $STH->execute($dbdata);
}

if (($handle = fopen($localCSV, "r")) !== FALSE) {
    $dbdata = array();
    $values = array();
    $row = 0;
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        if(!count($dbdata))
            $dbdata['SiteID'] = $siteID;

        $dbdata['TimeStamp_'.$row] = $data[0];
        $dbdata['ProductID_'.$row] = $data[1];
        $dbdata['CoordX_'.$row] = $data[2];
        $dbdata['CoordY_'.$row] = $data[3];
        $values[] = sprintf('(:SiteID_%1$s,:TimeStamp_%1$s,:ProductID_%1$s,:CoordX_%1$s,:CoordY_%1$s)', $row);
        $row++;

        if($row % 10 === 0) {
            set_time_limit(30);
            insert_data($DBH, $dbdata, $values);
            $values = array();
            $dbdata = array();
        }
    }
    // insert the rest
    if(count($values))
        insert_data($DBH, $dbdata, $values);
    fclose($handle);
    echo $row." rows inserted.";
}

The shortcut to at least read the php.ini configurations is phpinfo. Look into the PHP manual, a lot of the config values can be set at runtime from your code.

Lasse
  • 411
  • 4
  • 15
  • Regular server can insert at least 1k records / sec. Increasing time limit and still doing things wrong is not a valid solution. – Skpd Jan 12 '14 at 14:22