4

I am having an issue with bulk inserting into a staging table in MSSQLSRV 2008 R2.

I am inserting a CSV with ~200,000 lines and it is taking roughly 5 minutes to complete.

I tried using both PDO and the sqlsrv driver. They both seem to yield poor performance.

Here is the code to give an idea of what I am doing (I included both SQLSRV and PDO code):

...
    try {
        //create structure table record
        foreach ($mapped_data as $k => $v) {
            $insert .= $k . ",";
            $values .= $v . ",";
        }
        $insert = substr($insert, 0, -1); //remove last ,
        $values = substr($values, 0, -1);
        $tableName = $table;
        if ($stageData) {
            $tableName = "stage_$table";
        }
        if ( $query == "" )
            $query = "INSERT INTO $tableName ($insert) VALUES ";
        $query .= "($values),";

        // Insert in blocks of 1000 lines
        if ($line % 1000 == 0) {
            $log->logInfo("Executing @ line: $line");
            $query = substr($query, 0, -1); //remove last ,
            $query .= ";";

            // ======================
            // = SQLSRV DRIVER CODE =
            // ======================
            sqlsrv_begin_transaction($sqlsrvConn);
            $queryResult = sqlsrv_query($sqlsrvConn,$query);
            if ($queryResult) {
                sqlsrv_commit($sqlsrvConn);
            } else {
                sqlsrv_rollback($sqlsrvConn);
            }

            // ===================
            // = PDO DRIVER CODE =
            // ===================
            $conn->beginTransaction();
            $res = $conn->prepare($query);
            if($res->execute() === false) {
                $errInfo = $res->errorInfo();
                if ( $conn->inTransaction() ) {
                    $conn->rollback();
                }
                $log->logInfo('Data importing error on line: ' . $line . $errInfo[2]);
                $errors[] = 'Data importing error on line: ' . $line . $errInfo[2];
            } else {
                if ( $conn->inTransaction() ) {
                    $conn->commit();
                    $query = "";
                    $importedRows += ($line - 6) - $importedRows;
                }
            }
        }
    }
    catch (PDOException $e) {
        if ( $conn->inTransaction() ) {
            $conn->rollBack();
        }
        $log->logInfo('PDO Exception: ' . $e->getMessage());
        $errors[] = 'PDO Exception: ' . $e->getMessage();
    }
    $line++;
} // End of while loop through each CSV Line

fclose($handle);
$totalRows = $line - 6;
$importedRows += $totalRows - $importedRows;

// Insert remaing queries afterwards...
...

I've been scouring the internet looking for possible solutions but haven't been able to find anything that worked.

I've found this post which basically says to batch rows together (which I've already done).

And I found another post that said for PDO, to set connectionpooling=0. I tried that and didn't see any increase in performance.

Has anyone else run into this issue with SQLSRV and PHP?

Cheers,

Community
  • 1
  • 1
Gimli
  • 498
  • 1
  • 3
  • 14
  • +1 for using PDO, but you have to use prepared statements to be more protected. – bksi Oct 07 '13 at 18:01
  • Is the database server running on the same network as your PHP server? Are you possibly having a bandwidth issue? – MonkeyZeus Oct 07 '13 at 18:04
  • Have you tried benchmarking your script to see if the slow-down is in fact the INSERTs? – MonkeyZeus Oct 07 '13 at 18:06
  • The DB and PHP are running on my local development machine (which is a fairly strong machine). I have not tried benchmarking the script. Will try that – Gimli Oct 07 '13 at 18:25
  • How big is each record? How much other load is on your server? How good is your disk's write throughput? Are there a lot of large fields to index? Are you CPU-bound? Have you considered that inserting 200k records via *any* method might take 5 minutes? – Sammitch Oct 07 '13 at 18:26
  • @Gimli Please post your results when you have some time. We may be able to provide suggestions if we can see which sections/blocks of code are being slow. Thanks – MonkeyZeus Oct 07 '13 at 20:05

1 Answers1

1

I had a somewhat similar issue. Because my problem was a lack of available memory, my server kept having to take extra time to deal with handling virtual memory. If that's not your problem, my answer will not be useful to you.

You are using string concatenation followed by substr to remove the last comma. When you use substr, it makes another copy of the string, which is memory-intensive for long strings. See this question for an example of what happens when the strings are long. When I switched to array concatenation, my speed greatly increased due to the lower memory usage. However, if you don't have memory issues, array concatenation could actually be slower for you.

A couple of other things I saw were that you only need to collect the $inserts variable once, and you are not unsetting big variables as soon as you no longer need them. I don't know if correcting for that kind of thing will make an appreciable difference for you or not. Here are the basic kinds of changes you could try:

    if(!isset($insert)) {
        $insert = array();
        $collect = true;
    }
    $values = $array();
    foreach ($mapped_data as $k => $v) {
        if(isset($collect)) 
            $insert[] = $k;
        $values[] = $v;
    }
    unset($collect);

    .....
    if(!isset($queryend)) 
         $queryend = array();
    $queryend[] = "(".implode(",",$values).")";

    .....
    $query = "INSERT INTO $tableName ("
             .implode(",",$insert)
             .") VALUES "
             .implode(",", $queryend);
    unset($queryend);  //always unset big things as soon as possible

    .....  //after $res = $conn->prepare($query);
    unset($query);
Community
  • 1
  • 1
miyasudokoro
  • 1,705
  • 1
  • 15
  • 23
  • Memory isn't an issue on my machine, but it doesn't hurt to make things more memory efficient either. I will try this and see if it helps. Thanks! – Gimli Oct 08 '13 at 11:17