0

I'm inserting a few million of rows into a MySql table. I'm using prepared statement as shown below.

Would creating a single insert string like the one directly below be expected to be substantially faster?

Single string approach from Is 22 seconds a good time for inserting 500 rows in mysql? :

INSERT INTO example
  (example_id, name, value, other_value)
VALUES
  (100, 'Name 1', 'Value 1', 'Other 1'),
  (101, 'Name 2', 'Value 2', 'Other 2'),
  (102, 'Name 3', 'Value 3', 'Other 3'),
  (103, 'Name 4', 'Value 4', 'Other 4');

What I'm currently doing:

// 
// method to do upload
// 

public static void doUpload(Connection conn) {
    log.info("Deleting existing data...");
    Database.update("truncate table attribute", conn);
    log.info("Doing inserts");
    String sqlString = "insert into attribute values (null,?,?,?)";
    int max = 1000000;
    PreparedStatement ps = Database.getPreparedStatement(sqlString, conn);
    for (int i = 0; i < max; i++) {
        // add params
        String subjectId = i+"";
        addParam(subjectId, "GENDER", getGender(), ps);
        addParam(subjectId, "AGE", getAge(), ps);
        addParam(subjectId, "CITY", getCity(), ps);
        addParam(subjectId, "FAVORITE_COLOR", getColor(), ps);
        addParam(subjectId, "PET", getPet(), ps);
        if (i % 1000 == 0) {
            log.info("Executing " + i + " of " + max);
            Database.execute(ps);
            log.info("Done with batch update");
            ps = Database.getPreparedStatement(sqlString, conn);
        }
    }
    if (Database.isClosed(ps) == false) {
        Database.execute(ps);
    }
}

// 
// method to add param to the prepared statement
// 

private static void addParam(String subjectId, String name, String val, PreparedStatement ps) {
    ArrayList<String> params;
    params = new ArrayList<String>();
    params.add(subjectId + "");
    params.add(name);
    params.add(val);
    Database.addToBatch(params, ps);
}

// 
// addToBatch
// 

public static void addToBatch(List<String> params, PreparedStatement ps) {
    try {
        for (int i = 0; i < params.size(); i++) {
            ps.setString((i + 1), params.get(i));
        }
        ps.addBatch();
    } catch (Exception exp) {
        throw new RuntimeException(exp);
    }
}

What is the fastest way to do this type of insert?

I'm currently inserting 1000 rows in about 5 seconds. Is it reasonable to expect much better than this? I'm running locally and have already dropped all indexes on the table I'm inserting into.

John
  • 3,458
  • 4
  • 33
  • 54
  • It depends on the context. If inserting millions of row at the time is something your application is going to do very often. Than maybe the performance gains over the security gains are worth it. If not, i'd say go with prepared statement. – Nicolas Aug 15 '20 at 23:38
  • 1
    You might like my presentation [Load Data Fast!](https://www.slideshare.net/billkarwin/load-data-fast) – Bill Karwin Aug 15 '20 at 23:47
  • You're doing this wrong. You should be calling [`executeBatch()`](https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#executeBatch--), and reusing the `PreparedStatement`. It is certainly quicker than building SQL strings yourself, and more economical of memory. – user207421 Aug 16 '20 at 03:45
  • Thanks @BillKarwin. I'm getting about 25k rows per second now using Java ps and rewriteBatchedStatements=true which is good enough for now. I had forgot about MySql's loadfile feature but I don't think I need to go in that direction at this point. Thanks for the great article, I'll be sure to pass the link around. – John Aug 17 '20 at 02:28

3 Answers3

1

The fastest way to do batch inserts with JDBC is use addBatch / executeBatch, which you appear to be already doing.

For sample code, see

But that will only get you so much performance. For a real performance boost, add rewriteBatchedStatements=true to your JDBC url. You will see a significant improvement.

See MySQL and JDBC with rewriteBatchedStatements=true

Keep in mind that what you suggest in your "Single String Approach" is similar, but rewriteBatchedStatements=true also makes the network communication with the database more efficient.

GreyBeardedGeek
  • 29,460
  • 2
  • 47
  • 67
  • Yup, rewriteBatchedStatements=true added to the url for the connection worked magic. What took about 5 seconds is now taking about 0.1 second. – John Aug 16 '20 at 17:40
1

Not sure what Database.getPreparedStatement is doing but you usually do not need to recreate PreparedStatement object after each batch execution, you can still reuse it. Also have you tried to set larger batch size? As of now your batch size is 1000, have you tried making it bigger?

Ivan
  • 8,508
  • 2
  • 19
  • 30
0

Prepared statements offer the advantage of security. Theoretically, the prepared statement is precompiled and should still offer better performance.

Blacksmith
  • 712
  • 8
  • 21