75

I need to insert a couple hundreds of millions of records into the mysql db. I'm batch inserting it 1 million at a time. Please see my code below. It seems to be slow. Is there any way to optimize it?

try {
        // Disable auto-commit
        connection.setAutoCommit(false);

        // Create a prepared statement
        String sql = "INSERT INTO mytable (xxx), VALUES(?)";
        PreparedStatement pstmt = connection.prepareStatement(sql);

        Object[] vals=set.toArray();
        for (int i=0; i<vals.length; i++) {
            pstmt.setString(1, vals[i].toString());
            pstmt.addBatch();
        }

        // Execute the batch
        int [] updateCounts = pstmt.executeBatch();
        System.out.append("inserted "+updateCounts.length);
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • 1
    Your code got a little corrupted (and truncated prematurely) – Uri Jun 07 '10 at 21:19
  • BTW, which driver are you using? A general JDBC, or the JDBC-Mysql connector? – Uri Jun 07 '10 at 21:20
  • I'm using com.mysql.jdbc.Driver –  Jun 07 '10 at 21:26
  • How long does it take? What's the comparison material on which you concluded that it's slow? – BalusC Jun 07 '10 at 21:28
  • I'm only doing a test insert of 1 million into my local pc(4gb ram), it took about 10 min, just wondering if there's any room for improvement –  Jun 07 '10 at 22:04

6 Answers6

193

I had a similar performance issue with mysql and solved it by setting the useServerPrepStmts and the rewriteBatchedStatements properties in the connection url.

Connection c = DriverManager.getConnection("jdbc:mysql://host:3306/db?useServerPrepStmts=false&rewriteBatchedStatements=true", "username", "password");
Bertil Chapuis
  • 2,477
  • 1
  • 18
  • 12
  • Nice! I'm seeing a 3x improvment – Kimble Dec 10 '12 at 11:50
  • 4
    @Kimble - so why not accept this answer? Thanks, mate! This works like magic! – Peter Perháč Apr 04 '13 at 08:56
  • OMG! Adding the above params to my connection URL sped up the batch inserts nearly 30x. I am not sure what other implications these variables have. But amazing it is! Thanks. – Keshav Apr 20 '13 at 05:26
  • 5x Improvement! But I want more. – Crocode May 31 '13 at 01:28
  • 3
    10 seconds instead of one hour. Well deserved +1! – 0x6B6F77616C74 Feb 22 '15 at 00:57
  • 4
    The [MySQL reference documentation for the driver properties](http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html) has moved. – ben3000 Apr 21 '15 at 06:26
  • 1
    I noticed another requirement with insert batching when using the Java driver (at least in the 5.1.10 version of the driver): In the SQL statement, the "VALUES" part must be followed by a space, not directly by the opening parenthesis, or else the driver will fall back to sequential inserts. – Christian Semrau Aug 04 '16 at 09:21
  • I'm seeing a 40x improvement. Why didn't the OP accept the answer? I completely missed this answer earlier. – Nav Feb 10 '17 at 11:57
  • 3
    Can you explain why adding useServerPrepStmts=false works? After reading https://stackoverflow.com/questions/32286518/whats-the-difference-between-cacheprepstmts-and-useserverprepstmts-in-mysql-jdb I would assume it would be the opposite. I'm sure you're right I just don't fully understand why/how that settings improves performance? Thank you. – Stephane Grenier Aug 22 '17 at 03:19
  • Why the heck wouldn't this be the default? – Greg Brown Jan 17 '19 at 21:49
  • how to check if these connection properties are working or not?? Cause I have been using these properties but it still cant make the execution faster. is there a way to check that if these properties are working so that I can check where the issue is? Thanks. – JustCurious Jan 09 '20 at 05:22
  • @JustCurious - In my case as well, still no improvements by adding `useServerPrepStmts=false&rewriteBatchedStatements=true` and also `spring.jdbc.getParameterType.ignore=true`, I am using `NamedParameterJdbcTemplate.batchUpdate` – PAA Feb 11 '21 at 11:31
  • In my case, what actually worked was `rewriteBatchedStatements=true`. set `useServerPrepStmts=false` did not help – Harper Mar 03 '21 at 12:15
  • This solution not working for the Postgres DB. – PAA Feb 14 '22 at 17:23
71

I'd like to expand on Bertil's answer, as I've been experimenting with the connection URL parameters.

rewriteBatchedStatements=true is the important parameter. useServerPrepStmts is already false by default, and even changing it to true doesn't make much difference in terms of batch insert performance.

Now I think is the time to write how rewriteBatchedStatements=true improves the performance so dramatically. It does so by rewriting of prepared statements for INSERT into multi-value inserts when executeBatch() (Source). That means that instead of sending the following n INSERT statements to the mysql server each time executeBatch() is called :

INSERT INTO X VALUES (A1,B1,C1)
INSERT INTO X VALUES (A2,B2,C2)
...
INSERT INTO X VALUES (An,Bn,Cn)

It would send a single INSERT statement :

INSERT INTO X VALUES (A1,B1,C1),(A2,B2,C2),...,(An,Bn,Cn)

You can observe it by toggling on the mysql logging (by SET global general_log = 1) which would log into a file each statement sent to the mysql server.

Eran
  • 387,369
  • 54
  • 702
  • 768
  • Eran - what about Updates, does rewriteBatchedStatements=true improve performance on updates same as on inserts, because the syntax on update isn't the same as on insert, and it cannot execute one as a whole(as per my knowledge). Saying this because i have like for a 10k+ rows maybe less than a second on insert, and for 100 rows 1 second on update. – rpajaziti Jun 12 '20 at 00:09
  • Thanks for the explanation. I'm a bit astounded that the JDBC driver is sending a long SQL string instead of using a binary API. – Sam Goldberg Dec 27 '22 at 16:27
  • @SamGoldberg well, this answer is 8.5 years old, so I'm not sure if the JDBC driver still behaves this way. – Eran Jan 01 '23 at 15:06
13

You can insert multiple rows with one insert statement, doing a few thousands at a time can greatly speed things up, that is, instead of doing e.g. 3 inserts of the form INSERT INTO tbl_name (a,b,c) VALUES(1,2,3); , you do INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(1,2,3),(1,2,3); (It might be JDBC .addBatch() does similar optimization now - though the mysql addBatch used to be entierly un-optimized and just issuing individual queries anyhow - I don't know if that's still the case with recent drivers)

If you really need speed, load your data from a comma separated file with LOAD DATA INFILE , we get around 7-8 times speedup doing that vs doing tens of millions of inserts.

nos
  • 223,662
  • 58
  • 417
  • 506
  • load data infile might be a good alternative, but my input file needs a cleanup, I'm only interested in inserting certain rows where the second token matches a string(space delimited tokens), is load data infile flexible enough to filter rows? –  Jun 07 '10 at 22:07
  • 3
    I don't think it can filter, but you can clean up the data yourself , write a new file with the cleaned data and load that file. – nos Jun 09 '10 at 23:27
  • My inserts are 10x faster now! – Matt Sgarlata Aug 09 '13 at 16:18
6

If:

  1. It's a new table, or the amount to be inserted is greater then the already inserted data
  2. There are indexes on the table
  3. You do not need other access to the table during the insert

Then ALTER TABLE tbl_name DISABLE KEYS can greatly improve the speed of your inserts. When you're done, run ALTER TABLE tbl_name ENABLE KEYS to start building the indexes, which can take a while, but not nearly as long as doing it for every insert.

Wrikken
  • 69,272
  • 8
  • 97
  • 136
1

You may try using DDBulkLoad object.

// Get a DDBulkLoad object
DDBulkLoad bulkLoad = DDBulkLoadFactory.getInstance(connection);
bulkLoad.setTableName(“mytable”);
bulkLoad.load(“data.csv”);
Lalith
  • 365
  • 3
  • 4
1
try {
        // Disable auto-commit
        connection.setAutoCommit(false);
        int maxInsertBatch = 10000;     
        // Create a prepared statement
        String sql = "INSERT INTO mytable (xxx), VALUES(?)";
        PreparedStatement pstmt = connection.prepareStatement(sql);

        Object[] vals=set.toArray();
        int count = 1;
        for (int i=0; i<vals.length; i++) {
            pstmt.setString(1, vals[i].toString());
            pstmt.addBatch();
            if(count%maxInsertBatch == 0){
                 pstmt.executeBatch();
            }
            count++;
        }

        // Execute the batch
        pstmt.executeBatch();
        System.out.append("inserted "+count);
  • instead of downvoting there could be comment on this, why it can or cannot improve performance when executing several batches in between and not all at once... – benez May 17 '17 at 14:28
  • Looks like the above answer has an almost same code snippet which is there is a question. – suhas0sn07 Apr 10 '19 at 04:53