0

I want to insert lots of records in batches. Some of the fields are varchars, so I want to pass that data as dynamic parameters. I am aware I could use exec_batch(), but that wouldn't be a good solution performance-wise, as it seems to send each row to the DB in separate calls. So what I basically want is to insert records in batches via building larger INSERT INTO statements that would insert (say) 1000 records each. But that would mean thousands of dynamic parameters for each of those inserts.

Methods like exec_drop() can accept tuples, but up to 12 elements. So how can I pass thousands of values of different/mixed types (ints, strings, etc.)? I am guessing I should use a vector of the so called "trait objects" somehow?

Perhaps the answer is in the docs, but I am still new to Rust and can't quite understand how to achieve that.

at54321
  • 8,726
  • 26
  • 46

2 Answers2

1

I am not a Rust programmer, but the performance considerations should be similar regardless of application language.

For this case, I would just prepare a query to INSERT one row, with parameters for the columns of that one row. Then run a loop to execute the prepared query with different values for the parameters. This factors out of the loop the overhead of parsing the SQL, because that's done only once on the prepare step. Executing that prepared query doesn't re-parse the statement.

I hope this is exactly what the exec_batch() is doing. But it might be re-preparing the INSERT statement each time in the loop. That would be a naive approach, but I don't know if the Rust package is naive or not.

If this is not performant enough, you should consider loading bulk quantities of data using LOAD DATA [LOCAL] INFILE. That's typically several times faster than any INSERT statement even with a set of 1000 tuples.

You might like my presentation Load Data Fast! in which I compared the relative performance of different data import solutions.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Actually the overhead of sending each row separately is generally bigger than the overhead of parsing the bigger SQL statements, even on localhost. And if your DBMS is on a remote machine, the extra latency for every request to the DBMS can have a really huge impact - in many cases orders of magnitude bigger than the parse overhead. – at54321 Apr 21 '21 at 20:19
  • As for the LOAD DATA statement - I'm aware of it, but in my case it wouldn't be a good fit for a couple of reasons. Also, the problem I've described could be experienced even with single-rec DMLs if they have more than 12 parameters. – at54321 Apr 21 '21 at 20:24
  • Yes, I think this represents a pretty huge design flaw in that Rust interface. I understand that [Rust doesn't support variadic functions](https://stackoverflow.com/questions/28951503/how-can-i-create-a-function-with-a-variable-number-of-arguments). Oh well. – Bill Karwin Apr 21 '21 at 20:42
0

OK, I found the way to do it. Simply create a Vec<Value> and push the params in it via Value::from(...) like this:

let mut params = Vec::new();
for r in recs {
    params.push(Value::from(&r.someString));
    params.push(Value::from(&r.someInt));
    params.push(Value::from(&r.someFloat));
    ...
}

That way we can easily pass thousands of parameters of different type.

However, beware there still seems to be a potential issue related to Rust's MySQL lib always preparing the statements, as described here. Although if one can afford to only pass varying values as dynamic parameters, then there will only be 1 prepared statement, so that should at least avoid the overhead of unneeded preparation and caching.


Previous answer (a partial solution)

I came up with a partial solution. It's far from perfect and won't work in all cases, but I will post it here, in case it helps readers. And I still hope that someone will come up with a better solution.

In most cases it is only the strings that really need to be escaped. Most of the other types could be safely inserted as static parameters. So we could basically generate a query like this:

INSERT INTO my_table(id, dt, name, notes, some_num)
VALUES
  (1, '2021-04-15 21:55:27', ?, ?, 6871.34),
  (2, '2021-04-15 22:33:44', ?, ?, NULL),
  (3, '2021-04-15 21:55:27', ?, ?, 55.1),
  (4, NULL, ?, ?, 55.1);

and pass the dynamic parameters with a vector of specific type like Vec<&Option<String>>, which would handle NULL values for us as well.

at54321
  • 8,726
  • 26
  • 46