Looking for a more efficient way to store CSV data on my database

Question

I'm provided with a .txt file every day which contains semicolon-separated data. Users of my app are meant to upload this file to the database daily.

Currently, I'm reading and storing the information as such:

$array = array();
$csv = str_getcsv($request->file, "\n");
foreach ($csv as &$row) {
    $row = str_getcsv($row, ";");
    $array[] = $row;
}
array_splice($array, 0, 1);

foreach ($array as &$row) {
    $query = Table::firstOrNew(['col2' => $row[1], 'col3' => $row[2]]);
    $query->col1 = $row[0];
    $query->col2 = $row[1];
    $query->col3 = $row[2];
    $query->col4 = $row[3];
    $query->col5 = $row[4];
    // [...]
    $query->col72 = $row[71];
    $query->col73 = $row[72];
    $query->save();
}

The thing with this method is that it takes too long to run successfully (the volume of data is about 5000 entries a day, which takes ~2 minutes to complete with the above code). As you can see, the amount of columns is immense, and the data has to be read as if dealing with a.CSV file, plus I cannot discard any of them at all.

Not to mention this problem increases in great magnitude if, for some reason, a user (or more) has to upload multiple days or even a month worth of data at once.

I need to figure out a better way to handle this situation. I've searched around for a solution but the best I could find was that I should use a for loop instead of foreach, which didn't really solve the issue.

who uses the data? Does the uploader have sole access to it or is it available to all? Does the data itself need to be loaded into the db and not just the save path? — Professor Abronsius, Mar 04 '20 at 07:19
Does it take two minutes of the users time or the servers time? What I mean is can the user upload one file then the next 10 seconds later? — Andreas, Mar 04 '20 at 07:34
The data is accessible by all users, and the user cannot upload another file until the current upload is done. This does not prevent other users from uploading. — , Mar 04 '20 at 12:35

Malkhazi Dartsmelidze · Answer 1 · 2020-03-04T07:49:06.373

You are checking for each row if it exists and if so updating and if not inserting, right? If so you can't optimize this code to run faster unless you have unique column for each row and run raw queries with ON DUPLICATE KEY UPDATE see this: Insert into a MySQL table or update if exists

Second solution is to delete all old records which belongs to that file or user or some unique which can't be uploaded twice and then insert new row chunks with insert method it will be much faster. Example

DB::beginTransaction();

try {
    Table::where('unique_file_rows_identified_column', $something)->delete();

    foreach(array_chunk($array, 1000) as $rows) {
        $rows_to_insert = [];
        foreach($rows as $row){
            $rows_to_insert[] = [
                'col1' => $row[0],
                'col2' => $row[1],
                'col3' => $row[2],
                'col4' => $row[3],
                'col5' => $row[4],
                // [...]
                'col72' => $row[71],
                'col73' => $row[72],
            ];
        }

        Table::insert($rows_to_insert);
    }
} catch (\Exception $e){ // If something went wrong and exception is thrown deleted rows will be restored
    DB::rollBack();
    dd($e->getMessage());
}
DB::commit();

This will run only 5 queries if file contains 5000 rows and will be inserted much faster

When using multiple (bulk) **update() / save() / ::insert()** it's a good practice to use `try/catch` — Digvijay, Mar 04 '20 at 07:36

score 0 · Answer 2 · answered Mar 04 '20 at 07:56

Would it be an option to let the database do the work for you?

LOAD DATA INFILE '/tmp/user_data.csv' INTO TABLE test FIELDS TERMINATED BY ';';

https://dev.mysql.com/doc/refman/8.0/en/load-data.html

You need to be sure that the CSV is valid of course.

Looking for a more efficient way to store CSV data on my database

2 Answers2