1

Okay, so I get around 100k-1M lines of text that I always import to a database. The code that i use is as follows:

$lines = new SplFileObject('/home/file.txt');
while(!$lines->eof()) {
    $lines->next();       //Skipping first line
    $row = explode(',',$lines);
    for($i = 0; $i<4; $i++){
        if(!isset($row[$i])){
            $row[$i] = null;
        }
    }
    $y = (float) $row[1];
    $z = (float) $row[2];
    $load_query = "INSERT IGNORE INTO new (datetime_gmt,field2,field3)
    VALUES ('".$row[0]."','".$y."','".$z."');";

    if(!$mysqli->query($load_query)){
      die("CANNOT EXECUTE".$mysqli->error."\n");
    }
}
$lines = null;

However, it takes waaayyy too long. Is there any faster way to do it, or am I stuck with this method?

PS. I don't want to use MySQL's "INSERT DATA INFILE".

  • 2
    [How to do a batch insert](https://stackoverflow.com/questions/5526917/how-to-do-a-batch-insert-in-mysql) – castis Jun 06 '17 at 13:45
  • You just need to read what castis quoted but you should also make the batch limited to 20-50 values(for instance) – ka_lin Jun 06 '17 at 13:47
  • 1
    You can write a bulk insert statements in chunks to minimize the number of calls to the database. If you don't care about the order of insertion into the database, then you can use a language that supports threading such as C(++, #), Go, Rust, Java, D, Swift, Scala, etc.... – Joshua Jones Jun 06 '17 at 13:52
  • @castis I'm sorry, i'm struggling to find how would this help. I understand that i can add multiple rows using insert, but how would i do it for 100k rows? – Abdallah Qaraeen Jun 06 '17 at 14:05
  • how do you insert your data to the the database ? after each $load_query ? – enzo Jun 06 '17 at 14:14
  • Sorry, forgot to include that, i have edited the code :) – Abdallah Qaraeen Jun 06 '17 at 14:18
  • @AbdallahQaraeen you wouldn't add 100k rows at once, you chunk it up into like 1000 rows or so for each insert. – castis Jun 06 '17 at 14:25
  • @castis Okay, do i manually type those 1000 rows? I'm trying to figure how would i use a for-loop or something similar but i'm unable to. Do you have any suggestions? :) – Abdallah Qaraeen Jun 06 '17 at 14:28
  • https://dev.mysql.com/doc/refman/5.7/en/load-data.html – AbraCadaver Jun 06 '17 at 14:42
  • @AbraCadaver Thanks for the help. But i have mentioned that i would like to avoid "INSERT DATA INFILE". :) – Abdallah Qaraeen Jun 06 '17 at 14:43

1 Answers1

3

As written, you're running an insert statement for every line. It'll be much faster if you compile a single multi-insert statement in the format of INSERT INTO table (foo, bar) VALUES (1, 2), (3, 4), (5, 6); that is executed once at the end. Something along the lines of this, though it could be cleaned up more.

$lines = new SplFileObject('/home/file.txt');
$load_query = "INSERT IGNORE INTO new (datetime_gmt,field2,field3)
    VALUES ";
while(!$lines->eof()) {
    $lines->next();       //Skipping first line
    $row = explode(',',$lines);
    for($i = 0; $i<4; $i++){
        if(!isset($row[$i])){
            $row[$i] = null;
        }
    }
    $y = (float) $row[1];
    $z = (float) $row[2];
    $load_query .= "('".$row[0]."','".$y."','".$z."'),";
}

if(!$mysqli->query(rtrim($load_query, ','))) {
    die("CANNOT EXECUTE".$mysqli->error."\n");
}
$lines = null;

Also keep make sure the data is trusted. If the file can come from an outside user, appending directly to the query string creates an SQL injection vector.

Elle H
  • 11,837
  • 7
  • 39
  • 42
  • yes, multi-inserts will run a lot faster. however, be careful: mysql has a maximum data packet size limit (adjustable on server). make sure your sql statement doesnt reach that limit. For instance do it as 1000 rows at once, then a new insert statement. if you do pass that limit, the sql statement will be corrupt and wont run. – Tuncay Göncüoğlu Jun 06 '17 at 14:56
  • using a transaction might also help a bit. not as much as multi-inserts tho. – Tuncay Göncüoğlu Jun 06 '17 at 14:57
  • I'm pretty sure this will work. I am though reaching my maximum data packet size and i'm not allowed to increase it. how can i do it as 1000 at once for example, using the code @Zurahn used? – Abdallah Qaraeen Jun 06 '17 at 15:24
  • Nevermind, i fixed it using the seek() function in SplFileObject! Thanks alot! :D – Abdallah Qaraeen Jun 06 '17 at 16:17