0

First time in my 7 years of development experience I came across a use case where I need to update thousands of rows at a time in mysql database. I'm using KnexJs as ORM in nodeJs.

My query is as follows

await db.transaction(trx => {
  const queries = collection.map(tuple => {  
    return db('table')
      .where('id', id)
      .update(tuple)
      .transacting(trx)
  });

  return Promise.all(queries)
    .then(trx.commit)    
    .catch(trx.rollback);
});

Now let's say my collection is an array of a million then it will take forever to complete and user can't stay on site that much. I am open to any cache plugin like redis or elasticsearch but I don't think they will be helpful in this use case but again I never used such solution ever but I'm open to solutions involving any third party as far as I can insert / update thousands (if not million) of queries within seconds.

Clearity of what I want

In the end I am sending JSON (containing updating data of different records) from postman to nodeJs backend which result in several minutes wait. I want to reduce this timing to a couple of seconds (2 - 8 seconds). Any suggestion is appreciated Thank You

Mani
  • 2,391
  • 5
  • 37
  • 81
  • I would take an approach like https://stackoverflow.com/a/40555000/17389 (abusing insert on duplicate key update to do the updates) but do a thousand at a time (the limit is going to be the total data exceeding max_allowed_packet) – ysth Jul 07 '22 at 17:06
  • but if you truly have millions that might still be too slow – ysth Jul 07 '22 at 17:07
  • You can use [LOAD DATA INFILE ... REPLACE](https://dev.mysql.com/doc/refman/8.0/en/load-data.html) to do a bulk-load of rows and update the rows by their id. That's likely to be an order of magnitude faster than executing thousands of updates one row at a time. – Bill Karwin Jul 07 '22 at 17:09
  • 1
    On the other hand, you haven't described anything about the table, so we're only assuming `id` is indexed. We're also assuming your network isn't the bottleneck. It's possible if you are using a network slow enough and you have a bulky payload, that there's no way to improve the time, because it just takes more seconds than you want to transfer the data over the network. – Bill Karwin Jul 07 '22 at 17:11
  • @BillKarwin As i mentioned `collection` can be in thousands so 100,000 means 100,000 update queries and `id` is primary key. My network is good enough i guess that's not a problem. – Mani Jul 07 '22 at 20:27
  • did you try batching them as shown in the answer I linked? (but see my comment to that answer) was it fast enough? – ysth Jul 08 '22 at 01:57

0 Answers0