0

I am trying to export data (nightly) out of an archaic system into something that's a little bit easier to write code for; however, the amount of data being exported is MASSIVE.

I am dumping the data as JSON objects and then trying to updateOrCreate() using Laravel, but I have way too much data. It usually takes about 10 hours to process all of the data into the 2.5 MILLION MySQL rows.

I think it is slow because it is actually performing 5 MILLION queries.

Does anyone know of any good example on how efficiently execute a MASSIVE updateOrCreate() using Laravel?

Marcin Nabiałek
  • 109,655
  • 42
  • 258
  • 291
Neve12ende12
  • 1,154
  • 4
  • 17
  • 33
  • 2
    You're probably going to want to do this outside of Eloquent. – ceejayoz Jul 25 '17 at 21:51
  • ORMs are not particularly great at handling lots of data. Use plain old sql for this. Insert ... select ... for inserts, multi table updates for updates. – Shadow Jul 25 '17 at 22:17
  • you might want to chunk data at least to avoid occupying your memory with these data before processing. How about attempting this method: https://stackoverflow.com/a/44891984/5704410 – Oluwatobi Samuel Omisakin Jul 25 '17 at 23:07

2 Answers2

0

Probably a lot of depends on exact data, but I would try to work with data packages. So you first for take for example 500 data records from JSON, then you manually run 500 queries to verify if data exists.

For those that exist you run update for each of them but for those that don't exist you don't insert them into immediately but you put them into array and then you insert them in bulk like this:

DB::table('your_table')->insert($records);

You should make sure you don't have too many records or too less to speed up the process as much as possible.

In general bulk inserts can speed your operations a lot but everything depends on data you have and what is exact algorithm of putting those data.

Marcin Nabiałek
  • 109,655
  • 42
  • 258
  • 291
  • I like this approach. It will definitely speed things up, but I am afraid it won't be very much as I am pretty sure 95% of the data is just updates (if it even needs to be updated since it is being dumped whether it has changed or not) – Neve12ende12 Jul 25 '17 at 22:01
  • Are you checking whether the data has actually changed before updating? Select queries are much less expensive than updates. – fubar Jul 25 '17 at 22:21
0

Try dividing your data into chunks and try to insert it. It might reduce time and try not to insert same data repeatedly.

$dataChunks = array_chunk($data, 10000);

foreach($dataChunks as $dataChunk){
   foreach($dataChunk as $record){
       DB::table('your_table')->insert($record);
   }
}

You can increase size of your chunks according to your data's size

theviz02
  • 11
  • 3