2

I have mysql database and I want to insert about 40'000 rows into it from PHP code , but my code takes more than 15 minutes to insert the rows, is there any chances to optimize it? where is my problem(PHP code / database design) ?

here are the details:

- the row data are stored in a utf-8 txt file the values separated by "\t" tab character and every row sets in one line of the file, like this

string view:

"value1\tvalue2\tvalue3\value4\value5\r\nvalue1\tvalue2\tvalue3\value4\value5\r\nvalue1\tvalue2\tvalue3\value4\value5\r\nvalue1\tvalue2\tvalue3\value4\value5\r\n"

text reader view:

value1     value2     value3     value4     value5
value1     value2     value3     value4     value5
value1     value2     value3     value4     value5
value1     value2     value3     value4     value5

-the data base has 3 tables as this:

table1 countries fields(1) (NAME varchar -primarykey-)
table2 products fields(2) (HS varchar - primarykey-, NAME varchar) 
table3 imports fields (6) (product_hs varchar -foreignkey->products(HS),
counteryname varchar - foreignkey->countries (NAME),
year year,
units int,
weight int,
value int)        

- php code was like this

$conn = new mysqli($hn,$un,$pw,$db);
if($conn->connect_error) {die($conn->connect_error);}

$x = 0; // row counter
ini_set('max_execution_time', 3000);

while(!feof($filehandle)){
$x++;
echo $x . ":  ";
$fileline = fgets($filehandle);
$fields = explode("\t", $fileline);
$query = "INSERT INTO imports(product_hs,counteryname,year,units,weight,value) VALUES(" . "'" . $fields[0] ."','". $fields[1] . "','2014','". $fields[2] . "','" . $fields[3] . "','" . $fields[4] .  "');";
$result = $conn->query($query);
if(!$result) {
    echo $conn->error . "</br>";
}else{
    echo $result . "</br>";
}
};

first I thought it is an index problem that slows down the insertion , so I removed all the indexes from "imports" table , but it didn't go faster!! is the problem from the database design or from my php code?

also note that the browser is notifying "waiting for response from the server" for the first 5 minutes then most of the remaining time is notifying "transferring data from server", is this because the response html has more than 40'000 line for the row counter1:1 </br> 2:1 </br> .....(declared in the php code)?

please consider I'm very newbie, thanks.

Accountant م
  • 6,975
  • 3
  • 41
  • 61
  • 1
    **WARNING**: When using `mysqli` you should be using [parameterized queries](http://php.net/manual/en/mysqli.quickstart.prepared-statements.php) and [`bind_param`](http://php.net/manual/en/mysqli-stmt.bind-param.php) to add user data to your query. **DO NOT** use string interpolation or concatenation to accomplish this because you have created a severe [SQL injection bug](http://bobby-tables.com/). **NEVER** put arbitrary user data directly into a query, it always requires escaping. – tadman May 18 '16 at 16:13
  • 1
    This might help: http://stackoverflow.com/a/780045/752527 – Hanlet Escaño May 18 '16 at 16:13
  • 4
    If you have well-formatted data, why not use [`LOAD DATA INFILE`](http://dev.mysql.com/doc/refman/5.7/en/load-data.html)? – tadman May 18 '16 at 16:13
  • Re: the "row counter output". Why not just comment those lines out and find out? – Uueerdo May 18 '16 at 16:24
  • 1
    The `load data` operation might be a good idea but I just wanted to point out that using prepared statements goes significantly faster as the query is only analyzed once and then records can be inserted extremely fast. – Julie Pelletier May 18 '16 at 16:25
  • 1
    +1 for `Load data infile`. Estimate: 2 to some seconds. You should try it, even if your file is not in the correct format, it might be faster to read the file into php and write a new, formatted file and use this for load data (it might be a good idea to do that anyway to check for errors or something). With prepared statements you might get to similar speeds though. – Solarflare May 18 '16 at 16:45
  • @tadman yes this function boosted the time amazingly! as you see in the tests results in the answer , thank you very much, and thank you also for the resources on the sql injection bug :) – Accountant م May 19 '16 at 02:22
  • @Solarflare yes, exactly 5 seconds it is as you estimated , I shared the tests result in an answer , thank you – Accountant م May 19 '16 at 02:24
  • @Uueerdo unfortunately , it wasn't response html problem , it was a bad php code problem that makes a separate query for every line of the data, i tried your suggestion but it just delayed the response to the end of php script since there was no `echos` in the loop, thank you. – Accountant م May 19 '16 at 02:27
  • @HanletEscaño yes it did help a lot , especially the part of gathering all statements into 1 insert statement, although I didn't try it because `LOAD DATA INFILE` did the job , but i will give it a try later, thanks :) – Accountant م May 19 '16 at 02:35

1 Answers1

2

Thank you very much tadman and Hanlet Escaño and Uueerdo and Julie Pelletier and Solarflare for helping me in the comments.

I did 3 different changes in my PHP code using the approaches you suggested in the comments, then I tested the results and here are the tests results.

The conclusion of the 3 tests: as tadman suggested, the key is in LOAD DATA INFILE . it dramatically reduced the execution time to less than 7 seconds, and these are the 3 tests.


ORIGINAL CODE: ~ 26 minutes

enter image description here


TEST 1 : ~ 34 minutes

enter image description here

(as Uueerdo suggested I removed the echo statements and the rows counter from the loop)

while(!feof($filehandle)){
// $x++; // commented out
//echo $x . ":  "; // commented out
$fileline = fgets($filehandle);
$fields = explode("\t", $fileline);
$query = "INSERT INTO products(hs,arabicname,englishname) VALUES(" . "'" . str_replace("'", ".", $fields[0]) ."'," . "'". str_replace("'", ".", $fields[1]) . "'," . "'". str_replace("'", ".", $fields[2]) . "');"; 
$result = $conn->query($query); 
/* // commented out
if(!$result) {echo  $conn->error . "</br>";}
}else{echo $result . "</br>";}
*/};

TEST 2 : ~ 7 seconds

enter image description here

(As tadman said I searched for LOAD DATA INFILE and it was super powerful

//replace the entire loop with this simple query
$query = "LOAD DATA LOCAL INFILE'" . 
addslashes("C:\\xampp\\htdocs\\bots\\impandexp\\imports.txt")
. "' INTO TABLE imports FIELDS TERMINATED BY '\t' LINES TERMINATED BY
'\r\n'(product_hs,counteryname,units,weight,value) SET  year = '2014';";
 $result = $conn->query($query);

TEST 3 : ~ 5 seconds

enter image description here

It was the same as test 2 except that I found useful tips on the same page that tadman give, that help in maximizing the speed for.

Bulk Data Loading for InnoDB Tables

// turning off index checks that might slows down bulk data insertion
$query = "SET foreign_key_checks=0;";
$conn->query($query);
$query = "SET unique_checks=0;";
$conn->query($query);
$query ="SET autocommit=0;";
$conn->query($query);

$query = "LOAD DATA LOCAL INFILE'" . addslashes("C:\\xampp\\htdocs\\bots\\impandexp\\imports.txt") . "' INTO TABLE imports FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\r\n'(product_hs,counteryname,units,weight,value) SET  year = '2014';";
$result = $conn->query($query);
echo $result . "</br>";
// turning them on again
$query = "SET foreign_key_checks=1;";
$conn->query($query);
$query = "SET unique_checks=1;";
$conn->query($query);
$query ="COMMIT;";
$conn->query($query);
halfer
  • 19,824
  • 17
  • 99
  • 186
Accountant م
  • 6,975
  • 3
  • 41
  • 61
  • 1
    Good work on getting it running. Just remember that `addslashes` is not the right tool for this job and `bind_param` is. This difference might seem academic, but it's an important distinction because one is a very superficial treatment of the string that leaves you exposed to SQL injection problems, the other can handle any kind of data you throw at it. Parameterized queries solve a lot of nasty debugging problems, too, since your data and query are kept separated. – tadman May 19 '16 at 03:22
  • Also worth mentioning is that most MySQL servers run with default InnoDB settings. Not only can you increase the [size of the buffer pool using `innodb_buffer_pool_size`](http://dev.mysql.com/doc/refman/5.7/en/innodb-buffer-pool.html) but you can also use other [performance tricks](http://dev.mysql.com/doc/refman/5.7/en/optimizing-innodb-bulk-data-loading.html) if it's necessary. – tadman May 19 '16 at 03:23
  • @tadman I used `addslashes()` because I couldn't add any slashes without it, I was getting the error can't find the file `C:xampphtdocsbotsimpandexpimports.txt` , why php removed all the slashes? is there any way else to add the slashes rather than `addslashes()` ?, thanks – Accountant م May 19 '16 at 08:08
  • Try doing `LODA DATA IN LOCAL INFILE ? INTO TABLE ...`, then use `prepare`, `bind_param` and `execute`. That leads to predictable, consistent results. – tadman May 19 '16 at 08:12