0

I can't figure out whether I should use 2 separate queries or if just 1 is possible.

I need to UPDATE a single row in table records but then INSERT 50,000 rows into table runners.

records

rc_id   runner       msg  
    1     bill    'hello'

runners

r_id    runner   km_run  
   1      mary      3.5
   2      anna      1.5
   3      john      6.5
   4      bill      1.5
   5      jess      6.2
   6      jack      2.5

I could write a query like below, but it would perform the UPDATE in every batch along with the INSERT in my code.

Given my Java code, how do I adapt this query so that the UPDATE is performed once, and the INSERT 50,000 times. If 2 queries is the standard approach, please let me know.

Working SQLFiddle

WITH r_updater AS
  (UPDATE records
   SET msg = 'goodbye'
   WHERE runner = 'bill' )
INSERT INTO runners (runner, km_run)
VALUES ('george', 2.3) ON CONFLICT (runner, km_run) DO
UPDATE
SET runner=EXCLUDED.runner,
    km_run=EXCLUDED.km_run;

Java Code:

   Connection conn = null;
   PreparedStatement stmt = null;
   try {
       conn = MyDBSource.getInstance().getConnection();

    // This is the first of 2 queries I'm running. I want to combine them somehow
       String updateQuery = "UPDATE records
       SET msg = 'goodbye'
       WHERE runner = 'bill';
       stmt = conn.prepareStatement(updateQuery);
       stmt.executeUpdate();

       conn.setAutoCommit(false);

       String insertQuery = "INSERT INTO runners (runner, km_run)
       VALUES(?, ?)
       ON CONFLICT(runner, km_run)
       DO UPDATE
       SET
       runner = EXCLUDED.runner,
       km_run = EXCLUDED.km_run;"

       stmt = conn.prepareStatement(insertQuery);
       int batchSize = 100;
       int rows = 0;

       for (RunnerModel runnerModel: runnerModelMap.entrySet()) {
           int i = 0;
           stmt.setString(++i, runnerModel.getRunner());
           stmt.setString(++i, runnerModel.getKmRun());
           stmt.addBatch();
           if (++rows % batchSize == 0) {
               stmt.executeBatch();
           }
       }
       if (rows % batchSize != 0) {
           stmt.executeBatch();
       }
       conn.commit();
   } catch (SQLException e) {
       e.getNextException().printStackTrace();
   } finally {
       closeResources(stmt, conn);

   }

If you're wondering why I'm using a HashMap at all, see here.

Mathomatic
  • 899
  • 1
  • 13
  • 38
  • *"I can't figure out whether I should use 2 separate queries"* Don't use any *queries*, use DML *statements*. Only the `SELECT` statement is a query. – Andreas May 09 '19 at 19:49
  • Why are you turning auto-commit off *after* the update statement? Don't you want the update statement to be part of the same transaction as the insert statements, so the update is rolled back too if an insert fails? – Andreas May 09 '19 at 19:53
  • *"how do I adapt this query so that the UPDATE is performed once, and the INSERT 50,000 times"* Isn't that what the code is already doing? One update and a loop over batched inserts. Seems you're already doing what you asking how to do. – Andreas May 09 '19 at 19:54
  • @Andreas fair enough about the syntax. As for your newest comment... My current code performs 2 "requests" from the Database doesn't it? I assumed that the first `stmt.executeUpate()` might not be needed as I could somehow integrate the command into the `SELECT` command. – Mathomatic May 09 '19 at 19:57
  • @Andreas, in terms of why auto-commit is off, that's because in a previous question I asked, the solution was to turn it off at the beginning. BUT, that didn't have the context of the `UPDATE` associated like this current question. So apparently I should remove it. Reference: https://stackoverflow.com/a/55940741/4594511 – Mathomatic May 09 '19 at 19:58
  • I'm clearly not understanding how `atomic` works. Are you saying my current approach is the most efficient given my requirements, since the update statement is considered part of the same transaction? (Assuming I remove `conn.setAutoCommit(false);` ?) I assumed `stmt.executeUpdate()` was its own individual, expensive transaction and that the subsequent `stmt.executeBatch()` were another. I was trying to combine the former into the latter to avoid 2 transactions. – Mathomatic May 09 '19 at 20:00
  • 1
    Assuming the hashmap has 50000 entries, your code perform 50001 "requests". Don't know what value `batchSize` has, but if e.g. 1000, then the code batches those into 51 separate "conversations" with the server. I don't see the problem with that. --- *"integrate the command into the SELECT command"* What `SELECT` command? --- I didn't ask why you turned auto-commit off. I asked why you did it *after* the `UPDATE`, not before it. – Andreas May 09 '19 at 20:03
  • 1
    If you remove `setAutoCommit(false)`, then you're doing multiple transactions, the exact opposite of what you should be doing. --- You kind of assumed correctly. The `executeUpdate()` is one transaction, because you haven't turned auto-commit off yet, and all the `executeBatch()` are together is another single transaction, because you did turn auto-commit off by then. Your reasoning for that assumption seems incorrect though. --- If you want all of it in a single transaction, turn auto-commit off *before* the update. You would usually turn it off as soon as you get the `Connection` object. – Andreas May 09 '19 at 20:08
  • Ohh, that makes perfect sense! Thanks for that. One question - You say you don't see the problem with 51 separate "conversations". I'm assuming "conversation" is not the same as a "transaction". If so, I'm left a little confused as to the point of batching so to minimize conversations, if the transactions will only be 1 regardless, assuming proper placement of `setAutoCommit(false)`. Obviously 51 transactions would be horribly inefficent, but I'm assuming 1 transaction with 51 conversations is acceptable? Thx. – Mathomatic May 09 '19 at 20:10
  • A conversation take time, because you have to stop, send the data, wait for server, then read the data. The amount of time is fairly small, so a few conversations don't matter, but the difference between 50000 and 50 conversations can definitely be felt. --- Now, *establishing* a connection is even more costly, so you should make sure to use pooled connections. Don't know if your code is doing that. – Andreas May 09 '19 at 20:19
  • Yes, I'm pooling using `BasicDataSource`. I can't find a better approach given my necessity to iterate such a large `HashMap`. I'm going to go with it. – Mathomatic May 09 '19 at 20:23

0 Answers0