2

I'm using Node.js and MySQL to insert 500+ rows with 50+ columns and update the row if the key already exists.

There are already StackOverflow posts explaining how to bulk insert and how to bulk insert + update but these solutions don't scale well with many columns. For each column that should be updated, they suggest adding column_name = VALUES(column_name) which is really annoying to add if you have 50+ columns. Any recommendations on how to achieve this faster?

cwallenwein
  • 538
  • 7
  • 20

2 Answers2

0

I usually accomplish this with an approach similar to the one outlined below. This provides a model agnostic of how many columns there are or what the column names are. I use a version of this as a function and pass it an object and table name which can both be leveraged so this works for multiple tables. Just ensure that when your objects are constructed they have keys which match the column names of the target table.

// an example element with multiple objects each containing the
// same key/value structure in my case this is often the result
// set of a prior query
let test_payload = [
  {
    record_id: 1001,
    value_1: 'robot',
    value_2: null,
    value_3: 'cat'
  },
  {
    record_id: 1002,
    value_1: 'robot',
    value_2: null,
    value_3: 'dog'
  },
  {
    record_id: 9001,
    value_1: 'robot',
    value_2: 'cat',
    value_3: "dog"
  }
];

// let the column name strings of the query be taken from the first
// object in your set
let the_keys = Object.keys(test_payload[0]);

// a placeholder for our UPDATE string
let updates = ''
// loop through the_keys and extend the string
for (var i = the_keys.length - 1; i >= 0; i--) {
    updates += `${the_keys[i]} = VALUES(${the_keys[i]})`
  // add a comma and space after every
  // entry but the last
  i > 0 ? updates += ', ' : ' '
}

// let the VALUES be mapped from the values of your objects into an
// array of arrays
let values = test_payload.map( obj => the_keys.map( key => obj[key]));

// build a query for your column names and pass your values using the
// standard question mark place holder. 1 example first and the query.
// an alternate query approach below to consider with explanation

// 1. and example for your specific use case
let sql = 'INSERT INTO table_name (' + the_keys.join(',') + ') VALUES ? ON DUPLICATE KEY UPDATE ' + updates + ';';
// call the query function and pass it your query string, along with values
db.query(sql, [values], function (error, results) {
  if (error) throw error;
  console.log('query results:', results);
  // do something else with your results…
});

// it is sometimes my personal preference to maintain a column such as
// update_count which can be leveraged to gauge what number of the total query
// are inserts vs update. if no value changes on an exiting record, no update
// will be counted in results.changedRows count - this approach ensures at least
// one value is always updated and registers in results.changedRows
let sql_example_with_update_count = 'INSERT INTO table_name (' + the_keys.join(',') + ') VALUES ? ON DUPLICATE KEY UPDATE update_count = update_count + 1;';

// for reference of each built item:
console.log(the_keys)
["record_id", "value_1", "value_2", "value_3"]

console.log(updates)
"value_3 = VALUES(value_3), value_2 = VALUES(value_2), value_1 = VALUES(value_1), record_id = VALUES(record_id)"

console.log(values)
[[1001, "robot", null, "cat"], [1002, "robot", null, "dog"], [9001, "robot", "cat", "dog"]]

This example assumes you know how to initialize and use connections. If not you can review this example: https://github.com/dusthaines/mysqljs_setup_snippet/blob/master/app.js

dusthaines
  • 1,320
  • 1
  • 11
  • 17
-1

If you need to either insert or update each of 500 rows, and there are 50 columns potentially inserted, then IODKU is likely to be a clear winner in speed.

The 50 col = VALUES(col) is only 50. I might construct this with something like (using PHP syntax):

$query = "INSERT ... "
foreach(explode(' ', 'col1 col2 col3 ...' as $col)
    $query .= "$col = values($col), ";

The other dimension is 500*50 values need to be fed to the IODKU.

REPLACE is slower because it is effectively DELETE + INSERT. IODKU is a combination of UPDATE and INSERT, and not much slower than either one individually.

Notice that if there is an AUTO_INCREMENT column, it will always consume 500 ids, regardless of how many were updated versus inserted.

If you are actually not changing any of the other columns, use INSERT IGNORE or this following 2-step method which avoids burning ids:

http://mysql.rjweb.org/doc.php/staging_table#normalization

Rick James
  • 135,179
  • 13
  • 127
  • 222