34

I am designing a MySQL database which needs to handle about 600 row inserts per second across various InnoDB tables. My current implementation uses non-batched prepared statements. However, writing to the MySQL database bottlenecks and my queue size increases over time.

The implementation is written in Java, I don't know the version off hand. It uses MySQL's Java connector. I need to look into switching to JDBC tomorrow. I am assuming these are two different connector packages.

I have read the following threads on the issue:

and from the mysql site:

My questions are:

  • Does anyone have advice or experience on performance differences using INSERTs with prepared statements in batch mode vs. using a single INSERT statement with multiple VALUEs.

  • What are the performance differences between the MySQL Java connector vs. JDBC. Should I be using one or the other?

  • The tables are for archive purposes, and will see ~90% write to ~10% read (maybe even less). I am using InnoDB. Is this the right choice over MyISAM?

Thank you in advance for your help.

Community
  • 1
  • 1
Darren
  • 527
  • 1
  • 5
  • 9

4 Answers4

41

JDBC is simply a Java SE standard of database access offering the standard interfaces so you're not really bound to a specific JDBC implementation. MySQL Java connector (Connector/J) is an implementation of the JDBC interfaces for MySQL databases only. Out of experience, I'm involved to a project that uses huge amount of data using MySQL, and we mostly prefer MyISAM for data that can be generated: it allows to achieve much higher performance losing transactions, but generally speaking, MyISAM is faster, but InnoDB is more reliable.

I wondered for the performance of the INSERT statements too about a year ago, and found the following old testing code in my code shelf (sorry, it's a bit complex and a bit out of your question scope). The code below contains examples of 4 ways of inserting the test data:

  • single INSERTs;
  • batched INSERTs;
  • manual bulk INSERT (never use it - it's dangerous);
  • and finally prepared bulk INSERT).

It uses TestNG as the runner, and uses some custom code legacy like:

  • the runWithConnection() method - ensures that the connection is closed or put back to the connection pool after the callback is executed (but the code below uses not reliable strategy of the statement closing - even without try/finally to reduce the code);
  • IUnsafeIn<T, E extends Throwable> - a custom callback interface for the methods accepting a single parameter but potentially throwing an exception of type E, like: void handle(T argument) throws E;.
package test;

import test.IUnsafeIn;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;

import static java.lang.String.format;
import static java.lang.String.valueOf;
import static java.lang.System.currentTimeMillis;

import core.SqlBaseTest;
import org.testng.annotations.AfterSuite;
import org.testng.annotations.BeforeSuite;
import org.testng.annotations.BeforeTest;
import org.testng.annotations.Test;

public final class InsertVsBatchInsertTest extends SqlBaseTest {

    private static final int ITERATION_COUNT = 3000;

    private static final String CREATE_TABLE_QUERY = "CREATE TABLE IF NOT EXISTS ttt1 (c1 INTEGER, c2 FLOAT, c3 VARCHAR(5)) ENGINE = InnoDB";
    private static final String DROP_TABLE_QUERY = "DROP TABLE ttt1";
    private static final String CLEAR_TABLE_QUERY = "DELETE FROM ttt1";

    private static void withinTimer(String name, Runnable runnable) {
        final long start = currentTimeMillis();
        runnable.run();
        logStdOutF("%20s: %d ms", name, currentTimeMillis() - start);
    }

    @BeforeSuite
    public void createTable() {
        runWithConnection(new IUnsafeIn<Connection, SQLException>() {
            @Override
            public void handle(Connection connection) throws SQLException {
                final PreparedStatement statement = connection.prepareStatement(CREATE_TABLE_QUERY);
                statement.execute();
                statement.close();
            }
        });
    }

    @AfterSuite
    public void dropTable() {
        runWithConnection(new IUnsafeIn<Connection, SQLException>() {
            @Override
            public void handle(Connection connection) throws SQLException {
                final PreparedStatement statement = connection.prepareStatement(DROP_TABLE_QUERY);
                statement.execute();
                statement.close();
            }
        });
    }

    @BeforeTest
    public void clearTestTable() {
        runWithConnection(new IUnsafeIn<Connection, SQLException>() {
            @Override
            public void handle(Connection connection) throws SQLException {
                final PreparedStatement statement = connection.prepareStatement(CLEAR_TABLE_QUERY);
                statement.execute();
                statement.close();
            }
        });
    }

    @Test
    public void run1SingleInserts() {
        withinTimer("Single inserts", new Runnable() {
            @Override
            public void run() {
                runWithConnection(new IUnsafeIn<Connection, SQLException>() {
                    @Override
                    public void handle(Connection connection) throws SQLException {
                        for ( int i = 0; i < ITERATION_COUNT; i++ ) {
                            final PreparedStatement statement = connection.prepareStatement("INSERT INTO ttt1 (c1, c2, c3) VALUES (?, ?, ?)");
                            statement.setInt(1, i);
                            statement.setFloat(2, i);
                            statement.setString(3, valueOf(i));
                            statement.execute();
                            statement.close();
                        }
                    }
                });
            }
        });
    }

    @Test
    public void run2BatchInsert() {
        withinTimer("Batch insert", new Runnable() {
            @Override
            public void run() {
                runWithConnection(new IUnsafeIn<Connection, SQLException>() {
                    @Override
                    public void handle(Connection connection) throws SQLException {
                        final PreparedStatement statement = connection.prepareStatement("INSERT INTO ttt1 (c1, c2, c3) VALUES (?, ?, ?)");
                        for ( int i = 0; i < ITERATION_COUNT; i++ ) {
                            statement.setInt(1, i);
                            statement.setFloat(2, i);
                            statement.setString(3, valueOf(i));
                            statement.addBatch();
                        }
                        statement.executeBatch();
                        statement.close();
                    }
                });
            }
        });
    }

    @Test
    public void run3DirtyBulkInsert() {
        withinTimer("Dirty bulk insert", new Runnable() {
            @Override
            public void run() {
                runWithConnection(new IUnsafeIn<Connection, SQLException>() {
                    @Override
                    public void handle(Connection connection) throws SQLException {
                        final StringBuilder builder = new StringBuilder("INSERT INTO ttt1 (c1, c2, c3) VALUES ");
                        for ( int i = 0; i < ITERATION_COUNT; i++ ) {
                            if ( i != 0 ) {
                                builder.append(",");
                            }
                            builder.append(format("(%s, %s, '%s')", i, i, i));
                        }
                        final String query = builder.toString();
                        final PreparedStatement statement = connection.prepareStatement(query);
                        statement.execute();
                        statement.close();
                    }
                });
            }
        });
    }

    @Test
    public void run4SafeBulkInsert() {
        withinTimer("Safe bulk insert", new Runnable() {
            @Override
            public void run() {
                runWithConnection(new IUnsafeIn<Connection, SQLException>() {
                    private String getInsertPlaceholders(int placeholderCount) {
                        final StringBuilder builder = new StringBuilder("(");
                        for ( int i = 0; i < placeholderCount; i++ ) {
                            if ( i != 0 ) {
                                builder.append(",");
                            }
                            builder.append("?");
                        }
                        return builder.append(")").toString();
                    }

                    @SuppressWarnings("AssignmentToForLoopParameter")
                    @Override
                    public void handle(Connection connection) throws SQLException {
                        final int columnCount = 3;
                        final StringBuilder builder = new StringBuilder("INSERT INTO ttt1 (c1, c2, c3) VALUES ");
                        final String placeholders = getInsertPlaceholders(columnCount);
                        for ( int i = 0; i < ITERATION_COUNT; i++ ) {
                            if ( i != 0 ) {
                                builder.append(",");
                            }
                            builder.append(placeholders);
                        }
                        final int maxParameterIndex = ITERATION_COUNT * columnCount;
                        final String query = builder.toString();
                        final PreparedStatement statement = connection.prepareStatement(query);
                        int valueIndex = 0;
                        for ( int parameterIndex = 1; parameterIndex <= maxParameterIndex; valueIndex++ ) {
                            statement.setObject(parameterIndex++, valueIndex);
                            statement.setObject(parameterIndex++, valueIndex);
                            statement.setObject(parameterIndex++, valueIndex);
                        }
                        statement.execute();
                        statement.close();
                    }
                });
            }
        });
    }

}

Take a look at the methods annotated with the @Test annotation: they actually execute the INSERT statements. Also please take a look at the CREATE_TABLE_QUERY constant: in the source code it uses InnoDB producing the following results at my machine with MySQL 5.5 installed (MySQL Connector/J 5.1.12):

InnoDB
Single inserts: 74148 ms
Batch insert: 84370 ms
Dirty bulk insert: 178 ms
Safe bulk insert: 118 ms

If you change the CREATE_TABLE_QUERY InnoDB to MyISAM, you'd see significant performance increase:

MyISAM
Single inserts: 604 ms
Batch insert: 447 ms
Dirty bulk insert: 63 ms
Safe bulk insert: 26 ms

Hope this helps.

UPD:

For the 4th way you must properly customize the max_allowed_packet in mysql.ini (the [mysqld] section) to be large enough to support really big packets.

Lyubomyr Shaydariv
  • 20,327
  • 12
  • 64
  • 105
  • Thanks for the benchmarks, that was the most straightforward answer I could have asked for. I implemented batched prepared inserts today and it worked like a charm! – Darren Jul 10 '12 at 01:57
  • 4
    Any idea why batch insert is slower than single inserts on InnoDB? – stracktracer Nov 15 '12 at 23:17
  • @stracktracer hmmm yes, indeed. It's hard to say, because it probably might be relevant to a specific test case only. Thank you for the notice. – Lyubomyr Shaydariv Dec 15 '12 at 06:24
14

I know this thread is pretty old, but I just thought I would mention that if you add "rewriteBatchedStatements=true" to the jdbc url when using mysql, it can result in huge performance gains when using batched statements.

Jordan L
  • 155
  • 1
  • 8
  • This is the first time I have come across the 'rewriteBatchedStatements' setting. With large batch statements this increases the speed by orders of magnitude and is just what I was looking for. See https://stackoverflow.com/questions/26307760/ for more info. – Forge_7 May 15 '18 at 14:16
1

Do you have any triggers on any of the affected tables? If not, 600 inserts per second does not look like a lot.

Batch insert functionality from JDBC will issue the same statement multiple times in the same transaction, whereas multi-value SQL will squeeze all values in a single statement. In case of multi-value statement, you will have to construct the insert SQL dynamically and this could be an overhead in terms of more code, more memory, SQL Injection protection mechanism etc. Try the regular batch functionality first, for your workload, it shouldn't be a problem.

If you don't receive data in batches then consider batching it before insert. We use a Queue on separate thread to implement a Producer-Consumer arrangement. In this we hold back inserts till certain time has elapsed or the size of the queue has crossed a threshold.

In case you want the producer to be notified about a successful insert, then some more plumbing is required.

Sometimes just blocking on thread can be more straight forward and practical.

if(System.currentTimeMills()-lastInsertTime>TIME_THRESHOLD || queue.size()>SIZE_THRESHOLD) {
    lastInsertTime=System.currentTimeMills();
    // Insert logic
    } else {
    // Do nothing OR sleep for some time OR retry after some time. 
    }
Dojo
  • 5,374
  • 4
  • 49
  • 79
  • Thank you for your advice. I did some research today and created a rudimentary producer-consumer relationship. My data processer works in one thread, adding information to a queue belonging to the mysql inset thread. It seems to work nicely. I was using innodb because there were some important foreign key relations I was going to try to keep. It seems they might not really be necessary in the grad scheme of things, so I might switch to myISAM tomorrow and see how things go. – Darren Jul 10 '12 at 02:01
1

After some of my own testing Jordan L gave the best tip. I think the execution time Lyubomyr has given for InnoDB non-dirty batch inserts is wrong because he most likely didn't use "rewriteBatchedStatements=true" in the JDBC connection string. Without it batches are worthless. In my own testing the non-dirty batch inserts using prepared statements was even faster than the dirty way of doing it with prepared statements.

user223304
  • 123
  • 7