3

I need to read an excel file with ~10.000 rows and save them into a table in MySQL. The approach I used is to create a foreach() loop and inside of it prepare, bind and execute each row.

The execution time is circa 130 second and I think that is pretty bad. That is on localhost, so when the script will be running live ( shared hosting ), the execution time will surely be higher.

This is the code

ini_set('max_execution_time', 300);

$time_start = microtime(true);

$user = 'root';
$pass = '';
$driver_options = array(
    PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'",
    PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
    PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_OBJ,
);
try {
    $dbh = new PDO('mysql:host=127.0.0.1;dbname=excel', $user, $pass, $driver_options);
} catch (PDOException $e) {
    print "Error!: " . $e->getMessage() . "<br/>";
    die();
}

set_include_path(get_include_path() . PATH_SEPARATOR . 'Classes/');

/** PHPExcel_IOFactory */
include 'PHPExcel/IOFactory.php';
$inputFileName = 'original.xls';
try {
    $objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
} catch(Exception $e) {
    die('Error loading file "'.pathinfo($inputFileName,PATHINFO_BASENAME).'": '.$e->getMessage());
}

/*
    cap [X] - loc [AK]
    targa [D]
    fabbrica [F]
    provincia di residenza [V] - loc [AI]
    comune di residenza [W] - loc [AJ]
    data prima immatricolazione [AB]
    dati anagrafici [T] - loc [AG]
*/

$xls = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
$headers = $xls[1];
$loops = 0;
$rows = array_slice($xls, 1);

foreach ( $rows as $row ) {

    $excelData = array(
        'targa'                 => $row['D'],
        'fabbrica'              => $row['F'],
        'immatricolazione'      => $row['AB'],
        'cap'                   => $row['AK'] == '' ? $row['X'] : $row['AK'],
        'datiAnagrafici'        => $row['AG'] == '' ? $row['T'] : $row['AG'],
        'comuneResidenza'       => $row['AJ'] == '' ? $row['W'] : $row['AJ'],
        'provinciaResidenza'    => $row['AI'] == '' ? $row['V'] : $row['AI']
    );

    $insert = $dbh->prepare("
        INSERT INTO 
            data(targa, fabbrica, immatricolazione, cap, datiAnagrafici, comuneResidenza, provinciaResidenza)
            VALUES(:targa, :fabbrica, :immatricolazione, :cap, :datiAnagrafici, :comuneResidenza, :provinciaResidenza)
    ");
    $insert->execute($excelData);
    if ( $insert->rowCount() != 1 ) {
        echo 'Script interrupted at loop nr. '.$loops;
        break;
    }

    ++$loops;

}

$time_end = microtime(true);

$execution_time = ($time_end - $time_start);

echo '<b>Total Execution Time:</b> '.$execution_time.' s';

Is there any way I can optimize the code performance wise ? Is there something wrong with the loop ?

Thank you.

C. Ovidiu
  • 1,104
  • 2
  • 18
  • 38
  • 3
    You could start by preparing outside of the loop - nothing changes in the query's structure, does it? – kero May 12 '14 at 22:57
  • 7
    export excel to csv, LOAD DATA INFILE, its bazing fast –  May 12 '14 at 22:58
  • look into auto comitting and turn it off for this – Nick Maroulis May 12 '14 at 22:59
  • @kingkero That is stunning. I moved the prepare outside of the loop and the execution time went down to 8.40s – C. Ovidiu May 12 '14 at 23:05
  • 10,000 * 30+ columns = high RAM usage. I'm not particularly familiar with PHPexcel, but you might want to find a way to iterate over it row-by row rather than loading the entire file into memory. If your input file keeps growing you will eventually exhaust your available memory. – Sammitch May 12 '14 at 23:08
  • @kingkero After further tests it seems that by moving the prepare outside the loop, ~1500 of the rows from the files are not being inserted into mysql, any idea why ? I don't get any errors, just the execution time output which is at the end of the script, which means no errors were thrown – C. Ovidiu May 12 '14 at 23:38

2 Answers2

2

If you can easily convert the XLS to CSV, you can use the LOAD DATA INFILE statement in mysql. That will be much faster.

Community
  • 1
  • 1
nl-x
  • 11,762
  • 7
  • 33
  • 61
  • Thank you for the suggestion, I will surely do so. Just a quick question, I am not a big fan of excel, but the conversion can be done without php and directly from excel itself ? – C. Ovidiu May 12 '14 at 23:06
  • 1
    @C.Ovidiu Yes. In Excel, use menu `File -> Save as...` and choose a csv format. Also you don't even need PHP to run the `load data infile` statement. This can also be done from your shell/workbench/phpmyadmin. – nl-x May 12 '14 at 23:07
  • Ok, I went to see about CVS and LOAD DATA INFILE, the problem is that the excel file has 20 columns, but I only need to store 7 of them and by what I understand, CVS are saved as they are in the database, meaning that all the columns in the file must also be columns in the database. Am i wrong ? – C. Ovidiu May 12 '14 at 23:14
  • LOAD DATA INFILE allows you to specify the columns that you actually want to load – Mark Baker May 12 '14 at 23:26
  • @C.Ovidiu `LOAD DATA INFILE 'file.txt' INTO TABLE t1 (column1, @ignore, column2, @ignore, column3);` ` – nl-x May 12 '14 at 23:27
  • Thanks for the snippet. In this case though, I have to stick to xls format, not by my choice. I think the answer by @kingkero is the most useful in this case. – C. Ovidiu May 12 '14 at 23:40
2

You can send multiple values statements in a single query. I'd recommend batching your inserts in that way.

INSERT INTO table (...) VALUES (...), (...), (...);

You can collect your values for each row into an array, and then "flush" those values after a certain number have been collected (say 1000, for example).