0

I'm stumped on this one. I'm a little above beginner-level with PHP/MySQL and very new with posting on this site. GoDaddy switched me over to a grid server to boost performance and shed light on a problem with the way I have coded a script. It grabs data out of a CSV and attempts to insert into a normalized database.

The CSV is not normalized so there is a lot of checking to see if something exists. I originally had it opening/closing result sets, but then it was suggested to me to use prepared statements and unfortunately I have run into the same problem. I can get through about 1200 of 14k records before getting the broad "Internal Server Error". The error in the log references a security feature where it prevents hitting the FastCGI server too much in a short period of time.

What I'm trying to find out and learn is the proper method to accomplish what I'm trying to do -- check to see if something exists; if it does, get the record ID. If not, insert the data and get the new ID. My code is below. It gets the file name and a hidden attribute from a simple php file upload form and starts going from there. This will only be used by me and the data I'm inserting is public record so security isn't a major concern.

<?php

if ($_POST["upload"] == "1") {

    //Connect to the database
    $hostname = xxx;
    $username = xxx;
    $dbname = xxx;
    $password = xxx;

    $dbh = mysqli_connect($hostname,$username,$password,$dbname) or die("Problem connecting: ".mysqli_error());

    $stmt = mysqli_stmt_init($dbh);


    //check for file errors
  if ($_FILES["file"]["error"] > 0)
    { echo "Return Code: " . $_FILES["file"]["error"] . "<br>"; }
  //No file errors
  else
    {

    //If file already exists
    if (file_exists($_FILES["file"]["name"]))
    { 
            echo $_FILES["file"]["name"] . " already exists.";
            exit; 
    }

    //If it doesn't exist
    else
      {
      move_uploaded_file($_FILES["file"]["tmp_name"],
      $_FILES["file"]["name"]);
      echo "Stored in: " . $_FILES["file"]["name"] . "<br><br>";
      $strFileName = $_FILES["file"]["name"];
      }
    }

    //File reporting
    echo "Upload: " . $_FILES["file"]["name"] . "<br>";
    echo "Type: " . $_FILES["file"]["type"] . "<br>";
    echo "Size: " . ($_FILES["file"]["size"] / 1024) . " kB<br>";
    echo "Temp file: " . $_FILES["file"]["tmp_name"] . "<br>";



$row = 0;
if (($handle = fopen($strFileName, "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
        $num = count($data);
        $row++;
        $strPermitNo = trim($data[0]);

        //Check to see if the permit is already in the database
        $sql = "SELECT LocID FROM tbl_TABC_Locations WHERE LocPermitNo = ?";

        if (mysqli_stmt_prepare($stmt, $sql))
        {
            mysqli_stmt_bind_param($stmt, "s", $strPermitNo);
            mysqli_stmt_bind_result($stmt, $intLocID);
            mysqli_stmt_execute($stmt);

            $strPermitResult = 0;

            while (mysqli_stmt_fetch($stmt))
            {
                $strPermitResult = $intLocID;
            }
        }

        //If no permits, insert it
        if ($strPermitResult == "0")
        {       
            //Clean Location name
            $strLocName = trim($data[1]);
            $strLocName = str_replace('"', "", $strLocName);
            $strLocName = str_replace(";","-", $strLocName);
            $strLocName = addslashes($strLocName);

            $strInsertQuery = "INSERT INTO tbl_TABC_Locations (LocName,LocAddress,LocCity,LocState,LocZip,LocCounty,LocPhone,LocPermitNo) VALUES (?, ?, ?, ?, ?, ?, ?, ?)";

            if (mysqli_stmt_prepare($stmt, $strInsertQuery)) 
            {
              mysqli_stmt_bind_param($stmt, 'ssssiiis', $field1, $field2, $field3, $field4, $field5, $field6, $field7, $field8);

              $field1 = $strLocName;
              $field2 = trim(addslashes($data[2]));
              $field3 = trim(addslashes($data[3]));
              $field4 = trim($data[4]);
              $field5 = trim($data[5]);
              $field6 = trim($data[6]);
              $field7 = trim($data[7]);
              $field8 = $strPermitNo;
              mysqli_stmt_execute($stmt);

              $intLocID = mysqli_insert_id($dbh);
            }
        }

        else 
        {
            $intLocID = $strPermitResult;
        }


        //Report dates
        $strReportDate = trim($data[8]);
        $aryNewDate = explode("/", $strReportDate);
        $strNewYear = $aryNewDate[0];
        $strNewMonth = $aryNewDate[1];

        //Check to see if the report date is already in there
        $sql = "SELECT ReportDateID FROM tbl_TABC_ReportDates WHERE ReportYear = ? AND ReportMonth = ?";

        if (mysqli_stmt_prepare($stmt, $sql))
        {
            mysqli_stmt_bind_param($stmt, "ii", $strNewYear, $strNewMonth);
            mysqli_stmt_bind_result($stmt, $intReportDateID);
            mysqli_stmt_execute($stmt);

            $strReportDateResult = 0;

            while (mysqli_stmt_fetch($stmt)) 
            {
                $strReportDateResult = $intReportDateID;
            }
        }   

        if ($strReportDateResult == "0")
        {
            $strInsertQuery = "INSERT INTO tbl_TABC_ReportDates (ReportMonth,ReportYear) VALUES (?, ?)";

            if (mysqli_stmt_prepare($stmt, $strInsertQuery)) 
            {
              mysqli_stmt_bind_param($stmt, "ii", $field1, $field2);

              $field1 = $strNewMonth;
              $field2 = $strNewYear;

              mysqli_stmt_execute($stmt);

              $intDateID = mysqli_insert_id($dbh);
            }
        }
        else
        {
            $intReportDateID = $strReportDateResult;
        }


        //Check to see if they have reported for the month already, and if not, add the report      
        $sql = "SELECT ReportID FROM tbl_TABC_Reports WHERE ReportDateID = ? AND LocID = ?";    

        if (mysqli_stmt_prepare($stmt, $sql))
        {
            mysqli_stmt_bind_param($stmt, "ii", $intReportDateID, $intLocID);
            mysqli_stmt_bind_result($stmt, $intReportID);
            mysqli_stmt_execute($stmt);

            $strReportIDResult = 0;

            while (mysqli_stmt_fetch($stmt)) 
            {
                $strReportIDResult = $intReportID;
            }
        }   


        if ($strReportIDResult == "0")
        {

            $strInsertQuery = "INSERT INTO tbl_TABC_Reports (LocID,ReportDateID,TaxReceipts) VALUES (?, ?, ?)";

            if (mysqli_stmt_prepare($stmt, $strInsertQuery)) 
            {
              mysqli_stmt_bind_param($stmt, "iid", $field1, $field2, $field3);

              $field1 = $intLocID;
              $field2 = $intReportDateID;
              $field3 = trim($data[9]);

              mysqli_stmt_execute($stmt);

              echo "New report<br>\n";
            }
        }
        else { echo "<b>Already reported</b><br>"; }

    }
    echo "Closing file now";
    fclose($handle);
}

mysqli_close($dbh);
}

The error from the log is this:

[2436594] [fcgid:warn] (104)Connection reset by peer: [client xxx] mod_fcgid: error reading data from FastCGI server, referer (my webpage address)

[fcgid:warn] (104)Connection reset by peer: [client xxx] mod_fcgid: ap_pass_brigade failed in handle_request_ipc function, referer (my webpage address)

Edit 12/15 (Pulled prepared statements outside of loop). Now I'm still getting "Number of variables don't match in prepared statement" errors:

$sql1 = "SELECT LocID FROM tbl_TABC_Locations WHERE LocPermitNo = ?";
$ps_ChkPermit = mysqli_stmt_prepare($stmt, $sql1);

if ($ps_ChkPermit)
{
mysqli_stmt_bind_param($stmt, "s", $strPermitNo);
mysqli_stmt_bind_result($stmt, $intLocID);
mysqli_stmt_execute($stmt);
...
}
Jason J. Nathan
  • 7,422
  • 2
  • 26
  • 37
lindenmj
  • 3
  • 6
  • I had a similar script parsing 10s of thousands of records with no problems. What are you running php with? Apache? Nginx? Check the server logs and see the error – Jason J. Nathan Dec 14 '13 at 20:48
  • 2
    Two things immediately. Firstly:you should prepare your statements once before you enter the loop that handles the CSV data. Inside the loop just bind & execute. Secondly, don't check and insert. Set the key fields that you're checking to UNIQUE in your schema, then just INSERT. If you have a duplicate MySQL will flag an error you can check for. If you don't, you can retrieve the id from `last_insert_id()` –  Dec 14 '13 at 20:48
  • What's the error that you get? if you're on linux, just type tail /var/www/apache2/error.log – omar-ali Dec 14 '13 at 20:52
  • "The error in the log references a security feature" - please add that detail to your question. "I can get through about 1200 of 14k records" - sounds like a timeout, not DoS protection. How long does this take? Anything that takes more than a few seconds in an online process should be optimised or moved to an offline system. – halfer Dec 14 '13 at 20:52
  • @JasonNathan: I'm not sure what version... its whatever GoDaddy uses on their shared accounts. – lindenmj Dec 14 '13 at 21:10
  • @omar-ali: See recent edit for exact error – lindenmj Dec 14 '13 at 21:14
  • @lindenmj, I wasn't really asking about the version but no matter. You should have a `cpanel` or `plesk` access. Check the logs there (there should be an option to do so). Also, @MikeW comments are really useful. – Jason J. Nathan Dec 14 '13 at 21:14
  • @halfer: It doesn't take more than about 30 seconds to error out... about the same amount of time it took to process about the same number records on the old setup. It would take a couple minutes to do the entire 14k record file. – lindenmj Dec 14 '13 at 21:15
  • 1
    Okay, it is timing out. Your script is taking to long to complete. In the meantime, you could break up your CSV to bite sizes if you need it done urgently... – Jason J. Nathan Dec 14 '13 at 21:16
  • Yeah, it's timing out. You can increase your [time limit](http://php.net/manual/en/function.set-time-limit.php) but if this is a live web page, 30 seconds is way too long. 1200 inserts in 30 seconds does sound rather slow though - maybe the db server is under heavy load. That can be a problem with shared hosting. If you plan to stick with this host, make the inserts a cron task, and label the UI as "importing" before it is completed. – halfer Dec 14 '13 at 21:35
  • @MikeW: That method is above my head... can you provide a good website to learn about that or modify my code? I suspect I will have to modify my database too? (edited to add: do you mean the error that comes from the UNIQUE in my schema will prevent the INSERT and I will be able to figure out the ID of the data I'm trying to get? – lindenmj Dec 14 '13 at 21:44

1 Answers1

2

In general, you want to remove as much computation from within a loop to outside of the loop. Because every line of code gets executed over and over again during each iteration of the loop.

@MikeW's suggesting that you try to slim down your script's resource requirements as much as possible. Consider the following example (untested code!):

while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {

    $sql = "SELECT LocID FROM tbl_TABC_Locations WHERE LocPermitNo = ?";

    if (mysqli_stmt_prepare($stmt, $sql)) // <-- this keeps running every time.
    {
        mysqli_stmt_bind_param($stmt, "s", $strPermitNo);
        mysqli_stmt_bind_result($stmt, $intLocID);
        mysqli_stmt_execute($stmt);
        ...
    }
}

Why prepare your SQL statement over and over if its going to be the same every time?

$sql = "SELECT LocID FROM tbl_TABC_Locations WHERE LocPermitNo = ?";

// this statement gets prepped outside the loop and runs only once.
$prepared_statement = mysqli_stmt_prepare($stmt, $sql);

// loop starts...
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {

    if ($prepared_statement)
    {
        // and then you simply bind the params and execute from within the loop.
        mysqli_stmt_bind_param($stmt, "s", $strPermitNo);
        mysqli_stmt_bind_result($stmt, $intLocID);
        mysqli_stmt_execute($stmt);
        ...
    }
}

This way, you save on resources especially when you have to process many lines in your CSV.

Of course, it will mean that you need different variable names for different queries so you can identify each one. Do this right around where you currently declare $stmt

The second part of his suggestion involves a little more work. To reduce the number of queries, you can create a new field in the database which includes both the YEAR and MONTH and set this as a UNIQUE index in mysql. That way if you try to insert an existing record, mysql will throw and error.

If there is an error while inserting, you can assume you have a report for that date. If you don't have an error then the report is new.

Then you don't have the extra step of preparing yet another query just to check for a report's existence!

As I pointed out above, you can also reduce the CSV file sizes so it doesn't take too long to complete.

Further to @halfer's suggestion, it might be simpler to run this script with PHP-CLI. No memory limit timeouts - but this will mean you need to save the uploaded files somewhere and use cron tasks to process them later...

Requires some familiarity with the command line :)

Hope this clears stuff up a little... Good luck!

Jason J. Nathan
  • 7,422
  • 2
  • 26
  • 37
  • I've never heard of PHP-CLI... but it sounds intriguing as inputting the data doesn't necessarily need to be over HTTP. Unfortunately GoDaddy doesn't support it! How do I catch errors when the script bumps up against a record that already exists? – lindenmj Dec 15 '13 at 02:12
  • Please have a look at http://php.net/mysql_error. Try to google examples. Post a new question and tag me if you still need help. Did my answer suffice in helping you with your question? – Jason J. Nathan Dec 15 '13 at 06:34
  • The mysql_error seems easy enough! I'm going to get back at it and make the changes proposed above and see what happens. I'm sure there will be more questions... – lindenmj Dec 15 '13 at 15:37
  • If a ReportDate already exists and I error out when inserting, how can I effectively get its ID for use in the next query? Wouldn't that be another query and I'm back in the same situation? – lindenmj Dec 15 '13 at 16:08
  • @lindenmj, your comments are really asking a different question. You should either ask another question or try googling your options. This is to avoid lengthy discussions via SO comments. Anyway, check out http://dev.mysql.com/doc/refman/5.5/en/insert-on-duplicate.html. – Jason J. Nathan Dec 15 '13 at 16:26
  • The documentation is a little verbose for newbies. I googled `get id of duplicate key mysql` and the first result answers your question. http://stackoverflow.com/questions/778534/mysql-on-duplicate-key-last-insert-id – Jason J. Nathan Dec 15 '13 at 16:31
  • Sorry for the tangents (and thanks for the link). I've tested your method and put prepared statements outside the loop. However, now my script is telling me I'm not matching the number of variables in the prepared statement (see edit above). I have set LocPermitNo as `UNIQUE` in the table. – lindenmj Dec 15 '13 at 16:59