1

I am writing a program with threads insert into a db. Example

public static void save(String name){
{
    try(PreparedStatement preparedStatement = ...insert...)
    {
        preparedStatement.setString(1, name);
        preparedStatement.executeUpdate();
        preparedStatement.close();
    } catch (...){
    }
}

Question: Could it be that when simultaneously executing threads of insert into a table, one thread will use (preparedStatement.executeUpdate()) the preparedStatement from another Thread?

Rusandal
  • 25
  • 3

1 Answers1

2

Absolutely. You should not be doing this - each thread needs to have its own database connection (which therefore implies it neccessarily also ends up having its own PreparedStatement).

Better yet - don't do this. You're just making things confusing and slower, it's lose-lose-lose. There is no benefit at all to your plan. The database isn't going to magically do the job faster if you insert from 2 threads simultaneously.

The conclusion is simple: threads are a really bad idea when INSERTing a lot of data into the same table, so DO NOT DO IT!

But I really want to speed up my INSERTs!

My data gathering is slow

IF (big if!!) gathering the data for insertion is slower than the database can insert records for you, and the data gathering job lends itself well to multi-threading, then have threads that gather the data, but have these threads put objects onto a queue, and have a separate 'DB inserter' thread (the only thread that even has a connection to the DB) that pulls objects off this queue and runs an INSERT.

If you can gather the data quickly, or the source does not lend itself to multi-threaded, this only makes your code longer, harder to understand, considerably harder to test, and slower. No point at all.

Useful tools: LinkedBlockingQueue - an instance of this is the one piece of shared data all threads have. Your data gatherer threads toss objects onto this queue, and your single db inserted thread fetches objects off of it.

General insert speed advice 1: bundling

DBs work in transactions. If you have autocommit mode on (and Connections start in this mode), that's not 'no transactions'. That's merely (hence the name): The DB commits after every transaction. You can't do 'non-transactional' in proper databases. A commit() is heavy (takes a long time to process), but so is excessively long transactions (doing thousands of things in a single transaction before committing). Thus, you get the goldilocks principle: You want to run about 500 or so inserts, then commit.

Note that this has a downside: If an error occurs halfway through this process, then some records have been committed and some haven't been. Keep that in mind - your process needs to be idempotent or that is likely not acceptable and you'd need to make it idempotent (e.g. by having a column that lists the 'insert session' id, so you can delete them all if the operation cannot be completed properly) - and if your DB is simultaneously used by other stuff, you need more complexity as well (some sort of flag or filter so that other simultaneous code doesn't consider any of the committed, inserted records until the entire batch is completely added).

Relevant methods:

try (PreparedStatement ps = con.prepare.......) {
  int inserted = 0;
  while (!allGenerationDone) {
    Data d = queue.take();
    ps.setString(1, d.getName());
    ps.setDate(2, d.getBirthDate());
    // set the other stuff
    ps.execute();
    if (inserted++ % 500 == 0) con.commit();
  }
}
con.commit();

General insert speed advice 2: bulk

Most DB engines have special commands for bulk insertion. From a DB engine's perspective, various cleanup and care tasks take a ton of time and may not even be neccessary, or can be combined to save a load of time, when doing bulk inserts. Specifically, checking of constraints (particularly, reference constraints) and building of indices takes most of the time of processing an INSERT, and these things can either be skipped entirely or sped up considerably by doing them in bulk all at once at the end.

The way to do this is highly dependent on the underlying database. For example, in postgres, you can turn off constraint checking and turn off index building, then run your inserts, then re-enable. You can even choose to omit constraint checks entirely (meaning, your DB could be in an invalid state if your code is messed up, but if speed is more important than safety this can be the right way to go about it). Index building is considerably faster if done at the end.

Other databases generally have similar strategies. Alternatively, there are commands that combine it all, generally called COPY (instead of INSERT). Check your DB engine's docs.

Read this SO question for some info and benchmarks on how COPY compares to INSERT. And use a web search engine searching for e.g. mysql bulk insert.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • Thank you for the detailed answer. I agree. I tried using a single thread to insert (PrepareStatement from "while" queue) at the end, but lost data after SQLException. I didn't dive in and made the current implementation. I will do the insertion in one thread from the queue. Thanks! – Rusandal Nov 30 '21 at 13:05