2

This is the main code of a method that I use to import an excel (using Maatwebsite Laravel-Excel 2) file to my database:

$data = Excel::selectSheetsByIndex(0)->load($file, function($reader) {})->get()->toArray();
DB::beginTransaction();
try {
    foreach ($data as $key => $value) {
        $med= trim($value["med"]);
        $serial = trim($value["nro.seriemedidor"]);

        DB::table('medidores')->insert([
            "med" => $med,
            "serial_number" => $serial
        ]);
    }
    DB::commit();
} catch (\Exception $e) {
    DB::rollback();
    return redirect()->route('myroute')->withErrors("Some error message");
}

This works fine when I have "few" data (let's say less than 5000 rows in the excel file). But I need to work with a large excel file which has like 1.4 million of rows, divided in more than 1 sheet. How could I make my method more faster? Any tip?

EDIT: I will edit the question with the code that was on the link of one of the comments of the answer:

$data = Excel::selectSheetsByIndex(0)->load($file, function($reader) {})->get()->toArray();
DB::beginTransaction();
try {
  $bulk_data = [];
  foreach ($data as $key => $value) {
    $med= trim($value["med"]);
    $serial = trim($value["nro.seriemedidor"]);
    $bulk_data[] = ["med" => $med,"serial_number" => $serial] ;
  }
  $collection = collect($bulk_data);   //turn data into collection
  $chunks = $collection->chunk(100); //split into chunk of 100's
  $chunks->toArray(); //convert chunk to array
 //loop through chunks:
 foreach($chunks as $chunk)
 {
   DB::table('medidores')->insert($chunk->toArray());
 }
  DB::commit();
} catch (\Exception $e) {
  DB::rollback();
  return redirect()->route('myroute')->withErrors("Some error message");
}

The chunk thing was what worked for me.

pmiranda
  • 7,602
  • 14
  • 72
  • 155

1 Answers1

2

Yes you can , instead of performing X (number of database requests) * N (number of sheets) try doing a simple bulk insert which will only cost you the complexity of looping over data saving X * N database requests , here's an example :

$data = Excel::selectSheetsByIndex(0)->load($file, function($reader) {})->get()->toArray();
DB::beginTransaction();
try {
  $bulk_data = [];
  foreach ($data as $key => $value) {
     $med= trim($value["med"]);
     $serial = trim($value["nro.seriemedidor"]);
     $bulk_data[] = ["med" => $med,"serial_number" => $serial] ;
  }
  DB::table('medidores')->insert($bulk_data);
  DB::commit();
} catch (\Exception $e) {
  DB::rollback();
  return redirect()->route('myroute')->withErrors("Some error message");
}

You can refer to this answer for more explanation about db requests : https://stackoverflow.com/a/1793209/8008456

TheSalamov
  • 877
  • 9
  • 16
  • Mmm I know that have sense, but I'm doing a benchmark in my code and with my code it takes 9 seconds to insert 5000 rows. With the array in the end it tooks 14 seconds... I can make a video of it if you don't believe me. – pmiranda May 14 '18 at 16:58
  • And actually I have tested in MariaDB and PostgreSQL, in 2 different projects, with Laravel 5.4 and 5.6. Always was faster with inserts inside the foreach, I don't know why. – pmiranda May 16 '18 at 20:16
  • Nice! I tested with another insert that I have and it took 30 seconds instead 45. 50% faster. I will try with another. – pmiranda May 17 '18 at 19:17
  • 1
    Seems like you got to distribute the complexity between database requests and inner looping over bulk data while inserting (*not looping over main data*) try increasing/decreasing the chunks size and see what happens . – TheSalamov May 17 '18 at 19:23
  • I tested using chunks of 5, 100, 500, it doesn't change too much, it's always like ~30 seconds to complete the insert (3649 rows,12 columns). But this is great for me. – pmiranda May 17 '18 at 19:35
  • 2
    Try using this package whenever you are free https://github.com/rap2hpoutre/fast-excel#benchmarks it is a wrapper over Spout package which is believed to give you more better results , i haven't tried it yet but the benchmarks seems interesting . Edit : ( It is mentioned not to trust their benchmarks LOL ) – TheSalamov May 17 '18 at 20:03