1

I'm using Java 8, JDBC and MySql. I want to insert a large amount of data (2,000 rows) into 2 tables. The tables have a 1 to 1 relationship. First table is order_items:

| id      | amount          |
|:--------|----------------:|
| 1       | 20              |
| 2       | 25              |
| 3       | 30              |

Second table is delivery_details:

| orderItemId     | message    |
|----------------:|:-----------|
| 1               | hello.     |
| 2               | salut.     |
| 3               | ciao.      |

orderItemId is a foreign key to order_items.

The data is represented in this class:

public class OrderItemDelivery {

    @SerializedName("amount")
    private BigDecimal amount = null;

    @SerializedName("message")
    private String message = null;


    // getters and setters below
    ...
    ...

}

I need to execute the inserts as a batch to cut execution time. List<OrderItemDelivery> orderItemDeliveries contains 2,000 items. My current code is:

Connection connection = this.hikariDataSource.getConnection();
connection.setAutoCommit(false);
Statement statement = connection.createStatement();


for (int x = 0; x < orderItemDeliveries.size(); x++) {

    sql = String.format("INSERT INTO order_items (amount) VALUES ('%s')", orderItemDelivery.getAmount());
    statement.addBatch(sql);

    sql = String.format("INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (LAST_INSERT_ID(), '%s')", orderItemDelivery.getMessage());        
    statement.addBatch(sql);

}

statement.executeBatch();
statement.close();
connection.setAutoCommit(true);
connection.close();

This is really efficient, but the limitation here is it's open to SQL Injection. If I was to use PreparedStatement, I would need one for the order_items batch and one for the delivery_details batch. And then LAST_INSERT_ID() would not work.

Is there any way around this? From what I've seen, there isn't. And I need to prevent SQL Injection by sanitising the message and amount with Java, which appears to have limitations. For example message can contain apostrophies and emojis. Can anyone think of another solution?

EDIT

Here's a really efficient solution I've come up with:

String orderItemSql = "INSERT INTO order_items (amount) VALUES (?) ";

for (int x = 1; x < orderItemDeliveries.size(); x++) {
    orderItemSql += ", (?)";
}

PreparedStatement preparedStatement = connection.prepareStatement(orderItemSql, Statement.RETURN_GENERATED_KEYS);

int i = 1;
for (int x = 0; x < orderItemDeliveries.size(); x++) {

    preparedStatement.setDouble(i++, orderItemDelivery.getAmount().doubleValue());

}

preparedStatement.executeUpdate();
Long ids[] = new Long[orderItemDeliveries.size()];

ResultSet rs = preparedStatement.getGeneratedKeys();
int x = 0;
while (rs.next()) {
    ids[x] = rs.getLong(1);
    x++;
}


String deliveryDetails = "INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (?, ?)";
for (x = 1; x < orderItemDeliveries.size(); x++) {
    deliveryDetails += ", (?)";
}

preparedStatement = connection.prepareStatement(deliveryDetails);

i = 1;
for (x = 0; x < orderItemDeliveries.size(); x++) {
    orderItemDelivery = orderItemDeliveries.get(x);

    preparedStatement.setLong(i++, ids[x]);
    preparedStatement.setString(i++, orderItemDelivery.getMessage());
}

preparedStatement.executeUpdate();

So for this to work, the order of the ids must be sequential, and the order of orderItemDeliveries must not change between the first loop through of the list and the second.

This feels a bit hacky, but it works. Am I missing something?

Mark
  • 4,428
  • 14
  • 60
  • 116
  • I recommend you to focus on avoiding SQL injection, for it's a security matter, even if it implies giving away optimization. So, you should use `PreparedStatement` and `executeUpdate`. – Little Santi Apr 04 '20 at 18:25
  • Is it even possible with `PreparedStatement `? I need to execute 1 insert into `order_items`, then one into `delivery_details` to get `LAST_INSERT_ID()`, then the next one into `order_items` and so on. This question meets a similar issue: https://stackoverflow.com/questions/9547734/execute-multiple-inserts-with-preparedstatement-java – Mark Apr 04 '20 at 18:51
  • Did you have a look at org.apache.commons.lang.StringEscapeUtils.escapeSlq()? Btw 2000 rows isn't really that much, could you live without batching perhaps?. What about merging the 2 tables, since it is 1:1 anyway? – Curiosa Globunznik Apr 04 '20 at 19:27
  • Without batching it's taking well over a minute, with batching it's 5 seconds. Both tables have a lot of extra columns I haven't included< I would prefer to keep them separate and the Java models are also separate. Ill check that apache.commons method, thanks – Mark Apr 04 '20 at 19:48
  • That's valuable information. would you mind to cast it as an answer, I'd vote up. Otherwise I revive my answer and add your information. – Curiosa Globunznik Apr 06 '20 at 15:31

3 Answers3

1

Here's what I ended up doing, using getGeneratedKeys():

String orderItemSql = "INSERT INTO order_items (amount) VALUES (?) ";

for (int x = 1; x < orderItemDeliveries.size(); x++) {
    orderItemSql += ", (?)";
}

PreparedStatement preparedStatement = connection.prepareStatement(orderItemSql, Statement.RETURN_GENERATED_KEYS);

int i = 1;
for (int x = 0; x < orderItemDeliveries.size(); x++) {

    preparedStatement.setDouble(i++, orderItemDelivery.getAmount().doubleValue());

}

preparedStatement.executeUpdate();
Long ids[] = new Long[orderItemDeliveries.size()];

ResultSet rs = preparedStatement.getGeneratedKeys();
int x = 0;
while (rs.next()) {
    ids[x] = rs.getLong(1);
    x++;
}


String deliveryDetails = "INSERT INTO `delivery_details` (`orderItemId`, `message`) VALUES (?, ?)";
for (x = 1; x < orderItemDeliveries.size(); x++) {
    deliveryDetails += ", (?)";
}

preparedStatement = connection.prepareStatement(deliveryDetails);

i = 1;
for (x = 0; x < orderItemDeliveries.size(); x++) {
    orderItemDelivery = orderItemDeliveries.get(x);

    preparedStatement.setLong(i++, ids[x]);
    preparedStatement.setString(i++, orderItemDelivery.getMessage());
}

preparedStatement.executeUpdate();

So for this to work, the order of the ids must be sequential, and the order of orderItemDeliveries must not change between the first loop through of the list and the second.

This feels a bit hacky, but it works.

Curiosa Globunznik
  • 3,129
  • 1
  • 16
  • 24
Mark
  • 4,428
  • 14
  • 60
  • 116
0

Is it even possible with PreparedStatement ?

Good point, but since it is a 1:1 relationship you could use a separate sequence or AUTO_INCREMENT keys for each table, not last_insert_id(), given that they generate the same values for correlated records. In an oltp setting with concurrent transactions I wouldn't do that, but since you're batching anyway that may be reasonable. You could force exclusive access by locking both tables exclusively in advance, if that's an option.

Letting the application track the key values is also an option instead of using one autoinc field. Unfortunately mysql doesn't allow to select directly the next value from a sequence, as opposed to Oracle. E.g. this way: use a MAXKEY table with field MAX. Say you want to insert 10 rows, MAX is at 200. lock MAXKEY exclusively, select MAX (now you know, your keys can start with 200 + 1), update MAXKEY to 200 + 10, commit (releasing the lock). use 201...210 for 2 sets of batched inserts with prepared queries.

You could use a stored procedure to accept the values for both tables and insert seperately in bot of them (see this), again using last_insert_id(), and call the procedure in batched fashion (see this).

Eventually there are sql sanitizers, perhaps something on the line of org.apache.commons.lang.StringEscapeUtils.escapeSlq() may do.

But prepared statements also add other optimizations. The sql gets only sent once to the server, together with a 2-dimensional array of values. The parsed query can be cached and reused for subsequent calls. You should be able to see some more performance improvement just from that.

The string concatenation version sends the whole query for each row, all of them are different, need to be parsed and can't be found in the cache.

Curiosa Globunznik
  • 3,129
  • 1
  • 16
  • 24
  • The tables are a 1-1 but there's a lot of existing data and this has gotten out of sync i.e. the tables no longer have the same value for `id` for the same data item. Im going to check your stored procedure suggestion though. – Mark Apr 04 '20 at 19:46
  • yet another way: create a 3rd table with one autoinc field and the union of the fields of the other 2 tables, don't add any indexes to keep overhead low. add an after-insert trigger on the 3rd table, that inserts into the other 2. not so economical, though. – Curiosa Globunznik Apr 04 '20 at 19:53
  • Hmm a 3rd table. There's no straight-forward solution to what I thought must be a common problem! – Mark Apr 04 '20 at 20:05
0

I suggest you to try this. Even if it is not a batch approach, it is based upon PreparedStatement, which will always get a better performance over inlined SQL:

private void insertItems(Connection connection, Collection<OrderItemDelivery> orderItemDeliveries)
    throws SQLException
{
    try (PreparedStatement pst1=connection.prepareStatement("INSERT INTO order_items (amount) VALUES (?)", new String[] { "id"});
        PreparedStatement pst2=connection.prepareStatement("INSERT INTO delivery_details(orderItemId, message) VALUES (?, ?)"))
    {
        for (OrderItemDelivery orderItemDelivery : orderItemDeliveries)
        {
            pst1.setString(1, orderItemDelivery.getAmount());
            int x=pst1.executeUpdate();
            if (x != 1)
            {
                throw new SQLException("Row was not inserted");
            }
            try (ResultSet rs=pst1.getGeneratedKeys())
            {
                if (rs.next())
                {
                    long id=rs.getLong(1);
                    // TODO Fill the values in 2nd prepared statement and call executeUpdate().
                }
                else
                {
                    throw new SQLException("Id was not generated");
                }
            }
        }
    }
}

Note: You must try it first; not all db vendors implement the getGeneratedKeys method. In case yours does not, just replace the generated keys piece by a call to LAST_INSERT_ID: It should work the same.

Little Santi
  • 8,563
  • 2
  • 18
  • 46
  • Problem with this is that it involves 2,000 INSERTS for each table, so wouldn't be efficient. But im working on a solution that's similar, ill update the question when ready – Mark Apr 04 '20 at 20:35
  • It will be more efficient than inline SQL inserts, because using PreparedStatement saves time of hard-parsing at the DB engine. – Little Santi Apr 04 '20 at 21:32