8

I have to write a PHP script that will import data from a given CSV file into MySQL database. The given CSV file can contain up to 200,000 rows. I tried the following but problems arise :

  1. LOAD DATA LOCAL INFILE : I cannot use LOAD DATA LOCAL INFILE statement because I wanted to do some validations first BEFORE uploading the rows, also, our DB admin doesn't want me to use that statement and I don't know why.
  2. FOR LOOP : Inserting line by line inside FOR loop will take too much time resulting to Connection Timeout.

Now, I am thinking of a solution by splitting the CSV file into smaller chunks, then inserting them asynchronously. I am already done with the splitting of CSV, but I currently have no idea how to asynchronously insert into my database for quick and safe way. But I heard that I will be using Ajax here.

Any solution you can recommend? Thanks a lot in advance!

Mead Umandal
  • 373
  • 1
  • 2
  • 15
  • 1
    You can do validations with `LOAD DATA LOCAL INFILE`, just create a trigger on the table in which you do the import and let the trigger do the validations. – Daan Sep 10 '15 at 14:36
  • 200k rows is tiny for load data local infile, sounds like the way to go. You can also do validations before and re-write the file before importing. – AdrianBR Sep 10 '15 at 14:37
  • I really understand that LOAD DATA LOCAL INFILE is the best way to go. If it is just my standards, I would do it that way. But I am tasked to do it asynchronously. Also, I have no access to modify our database. Really, my boss just wanted me to write a PHP script that will load these rows into MySQL DB. What's the best way next to LOAD DATA LOCAL INFILE? – Mead Umandal Sep 10 '15 at 14:44
  • Why is your boss asking to do this via PHP? Seems like it should be a job for something like MySQL Workbench, with that many records. – terrorfall Sep 10 '15 at 15:34

4 Answers4

24

Thanks to everyone who gave answers to this question. I have discovered a solution! Just wanted to share it, in case someone needs to create a PHP script that will import a huge CSV file into MySQL database (asynchronously and fast!) I have tested my code with 400,000 rows and the importing is done in seconds. I believe it would work with larger files, you just have to modify maximum upload file size.

In this example, I will be importing a CSV file that contains two columns (name, contact_number) into a MySQL DB that contains the same columns.

Your CSV file should look like this :

Ana, 0906123489

John, 0908989199

Peter, 0908298392

...

...

So, here's the solution.

First, create your table

CREATE TABLE `testdb`.`table_test`
( `id` INT NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(100) NOT NULL ,
`contact_number` VARCHAR(100) NOT NULL ,
PRIMARY KEY (`id`)) ENGINE = InnoDB;

Second, I have 4 PHP files. All you have to do is place this into a single folder. PHP files are as follows :

index.php

<form action="upload.php" method="post" enctype="multipart/form-data">
<input type="file" name="csv" value="" />
<input type="submit" name="submit" value="Save" /></form>

connect.php

<?php
//modify your connections here
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "testDB";
$conn = new mysqli($servername, $username, $password, $dbname);
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
} 
?>

senddata.php

<?php
include('connect.php');
$data = $_POST['file'];
$handle = fopen($data, "r");
$test = file_get_contents($data);
if ($handle) {
    $counter = 0;
    //instead of executing query one by one,
    //let us prepare 1 SQL query that will insert all values from the batch
    $sql ="INSERT INTO table_test(name,contact_number) VALUES ";
    while (($line = fgets($handle)) !== false) {
      $sql .= "($line),";
      $counter++;
    }
    $sql = substr($sql, 0, strlen($sql) - 1);
     if ($conn->query($sql) === TRUE) {
    } else {
     }
    fclose($handle);
} else {  
} 
//unlink CSV file once already imported to DB to clear directory
unlink($data);
?>

upload.php

<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/1.11.1/jquery.js"></script>
<script>
//Declaration of function that will insert data into database
 function senddata(filename){
        var file = filename;
        $.ajax({
            type: "POST",
            url: "senddata.php",
            data: {file},
            async: true,
            success: function(html){
                $("#result").html(html);
            }
        })
        }
 </script>
<?php
$csv = array();
$batchsize = 1000; //split huge CSV file by 1,000, you can modify this based on your needs
if($_FILES['csv']['error'] == 0){
    $name = $_FILES['csv']['name'];
    $ext = strtolower(end(explode('.', $_FILES['csv']['name'])));
    $tmpName = $_FILES['csv']['tmp_name'];
    if($ext === 'csv'){ //check if uploaded file is of CSV format
        if(($handle = fopen($tmpName, 'r')) !== FALSE) {
            set_time_limit(0);
            $row = 0;
            while(($data = fgetcsv($handle)) !== FALSE) {
                $col_count = count($data);
                //splitting of CSV file :
                if ($row % $batchsize == 0):
                    $file = fopen("minpoints$row.csv","w");
                endif;
                $csv[$row]['col1'] = $data[0];
                $csv[$row]['col2'] = $data[1];
                $min = $data[0];
                $points = $data[1];
                $json = "'$min', '$points'";
                fwrite($file,$json.PHP_EOL);
                //sending the splitted CSV files, batch by batch...
                if ($row % $batchsize == 0):
                    echo "<script> senddata('minpoints$row.csv'); </script>";
                endif;
                $row++; 
            }
            fclose($file);
            fclose($handle);
        }
    }
    else
    {
        echo "Only CSV files are allowed.";
    }
    //alert once done.
    echo "<script> alert('CSV imported!') </script>";
}
?>

That's it! You already have a pure PHP script that can import multiple number of rows in seconds! :) (Thanks to my partner who taught and gave me an idea on how to use ajax)

Mead Umandal
  • 373
  • 1
  • 2
  • 15
  • You literally saved my day! It works really good and it is unbelievable fast. There is only one thing I want to ask: where do I change the delimiter of CSV-Files? – RoyRobsen Apr 01 '16 at 09:40
  • 1
    @RoyRobsen look at fgetcsv($handle, $batchsize, ';') it worked for me – daremachine Jun 29 '16 at 00:59
  • any one has idea to add header in every minpoints$row.csv and ignore in imports ? – Rohit Oct 15 '16 at 11:27
  • It's work perfectly ! Thank you ! But How I can import CSV when line look like : **"Ana", "0906123489"** – Rocstar Feb 02 '18 at 16:00
  • no one answered Rohit? I was needing to do that too. I know how to ignore header line if there is one, but the problem is if there are multiple split files that the first line of each file will be ignored, so the amount of records that are import will be wrong. That is why it would be good to add the same header to each split file. – leoarce Sep 03 '21 at 14:32
  • actually, easier way to do that would be to remove header for 1st generated csv file, instead of adding header to all split files then having to ignore headers for each file. the way i figured out to do that is in between the first section that says "if ($row % $batchsize == 0):". right after the fopen line, use this: if ($row++ == 0) continue; – leoarce Sep 03 '21 at 15:34
  • even though the generated csv files were being created properly, the data was not saving to database table, so what I did next made it work. don't do what I said previously. instead do this... right above the set_time_limit line, add this: fgetcsv($handle,10000,","); – leoarce Sep 03 '21 at 17:23
2

The main slowness comes from sending every single line as it's own request. I would suggest to send the query with every 1000 or 500 rows in the same format used by mysqldump --opt, so build a long string in the way

 insert into datatable (name, prename, commen) 
   values ('wurst', 'hans', 'someone')
   , ('bush', 'george', 'otherone')
   , ...
   ;

You should check how long your lines are allowed to be or if the MySQL- Server is in your control you could extend the maximal query length.

If this is still too long (I mean 200K is not much at all), then you could try to improve the csv-reading.

It is a bit work splitting into those chunks, but you could write a small chunk-class for this, so adding the rows gets a bit easier.

The usage of this class looked like

$chunk->prepare("insert into datatable (name, prename, comment) values");
$chunk->setSize(1000);

foreach ($row...){
   if($query = $chunk->addRow(...)){
       callUpdate($query);
   }
}
if($query = $chunk->clear()){
  callUpdate($query);
}
flaschenpost
  • 2,205
  • 1
  • 14
  • 29
0

I would still use LOAD DATA LOCAL INFILE into a temporary table and use MySQL to validate, filter, clean, etc with all data in a DB and then populate the destination table with the ready to go records.

Julio Soares
  • 1,200
  • 8
  • 11
  • Dude, his boss said no to `LOAD DATA LOCAL INFILE`. – Jonathan M Sep 10 '15 at 14:47
  • As far as I remember, `LOAD DATA INFILE` does only work in setups without replication, so most installations in bigger applications would die on it (replication is useful for generating backups, for failover and for some other things). The temporary table is a good idea anyway, so you can make more complex checks very easy before pushing data into the real table. – flaschenpost Sep 11 '15 at 05:42
0

You can use fgetcsv() with PHP.

Here is an example :

// Open the file with PHP
$oFile = fopen('PATH_TO_FILE', 'w');

// Get the csv content
$aCsvContent = fgetcsv($oFile);

// Browse your csv line per line
foreach($aCsvContent as $aRow){

    $sReqInsertData = ' INSERT
                        INTO
                            TABLENAME
                        SET
                            FIELD1 = "'.$aRow[0].'",
                            FIELD2 = "'.$aRow[1].'",
                            FIELD3 = "'.$aRow[2].'",
                            FIELD4 = "'.$aRow[3].'",
                            FIELD5 = "'.$aRow[4].'",
                            FIELD6 = "'.$aRow[5].'",
                            FIELD7 = "'.$aRow[6].'",
                            FIELD8 = "'.$aRow[7].'"';

    // Execute your sql with mysqli_query or something like this
    mysqli_query($sReqInsertData);
}

// Close you file
fclose($oFile);
SatanicGeek
  • 342
  • 5
  • 15
  • Thanks for the answer. But I think this is similar to what I mentioned on the question (item no.2).This results to connection timeout when applied to 200,00 rows. – Mead Umandal Sep 10 '15 at 15:03
  • [Increase] (http://php.net/manual/en/function.set-time-limit.php) the timeout to 300 seconds – SatanicGeek Sep 22 '15 at 12:36
  • Shouldn't that 'w' in fopen() be an 'r'? I just got my input file truncated to zero-length by using your code... or am I missing something? – pgr Jan 16 '16 at 12:09
  • You should use R because you don't need to write into the file. If it works only with W, so use it. – SatanicGeek Jan 17 '16 at 18:27