8

I'm trying to read a table from a sybase server, process the rows, and output the results to another table. (Below is my code)

The code retrieves the table pretty fast and processes equally fast (get's to the part where it sends within 30 seconds). But When I run execute batch it sits there for 20 minutes before finish (fyi, I have a table which I'm testing with 8400 rows).

Is there a more efficient way to do this? I'm amenable as to how I can recieve or send the queries (I can create a new table, update a table, etc) -- I just don't know why this is so slow (I'm sure the data < 1 MB and I'm sure it doesn't take the SQL server 20 minutes to parse 8400 rows). Any ideas?

Note: The reason this is really bad for me is that I have to parse a table with 1.2 MM rows (this table I'm working with right now is a test table with 8400 rows)

    Connection conn = DriverManager.getConnection(conString, user, pass);


    String sql = "SELECT id,dateid,attr from user.fromtable";
    Statement st = conn.createStatement();
    ResultSet rs = st.executeQuery(sql);

    String sqlOut = "INSERT INTO user.mytabletest (id,attr,date,estEndtime) values (?,?,?,?)";
    PreparedStatement ps = conn.prepareStatement(sqlOut);

    int i=1;

    while(rs.next())
    {
        int date = rs.getInt("dateid");
        String attr = rs.getString("attr");
        String id = rs.getString("id");

        Time tt = getTime(date,attr);
        Timestamp ts = new Timestamp(tt.getTime());

        ps.setString(1, id);
        ps.setString(2, attr);
        ps.setInt(3, date);
        ps.setTimestamp(4, ts);
        ps.addBatch();

        if(i % 10000 == 0)
        {
            System.out.println(i);
            ps.executeBatch();
            conn.commit();
            ps.clearBatch();                
        }

        i++;
    }
    System.out.println("sending "+(new Date()));
    int[] results = ps.executeBatch();
    System.out.println("committing "+(new Date()));
    conn.commit();
    System.out.println("done "+(new Date()));
user1167650
  • 3,177
  • 11
  • 34
  • 46

4 Answers4

19

To work with batches effectively you should turn AutoCommit option off and turn it on after executing the batch (or alternatively use connection.commit() method)

connection.setAutoCommit(false);
while(rs.next())
    {
     .....
     ps.addBatch();     
    }
int[] results = ps.executeBatch();
connection.setAutoCommit(true);
yggdraa
  • 2,002
  • 20
  • 23
11

Add ?rewriteBatchedStatements=true to the end of your JDBC url. It'll give you a serious performance improvement. Note that this is specific to MySql, won't have any effect with any other JDBC drivers.

Eg : jdbc:mysql://server:3306/db_name?rewriteBatchedStatements=true

It improved my performance by more than 15 times

Suryakant Pandey
  • 324
  • 2
  • 10
7

I had this same problem, finally figured it out though I also was not able to find the right explanation anywhere.

The answer is that for simple un-conditioned inserts .executeBatch() should not be used. What batch mode is doing is making lots of individual "insert into table x ..." statements and that is why it is running slow. However if the insert statements were more complex, possibly with conditions that affect each row differently, then it might require individual insert statements and a batch execution would actually be useful.

An example of what works, try the following which creates a single insert statement as a PreparedStatement (but same concept as a Statement object would require), and solves the problem of running slow:

public boolean addSetOfRecords(String tableName, Set<MyObject> objects) {
    StringBuffer sql = new StringBuffer("INSERT INTO " + tableName + " VALUES (?,?,?,?)");
    for(int i=1;i<objects.size();i++) {
        sql.append(",(?,?,?,?)");
    }
    try {
        PreparedStatement p = db.getConnection().prepareStatement(sql.toString());
        int i = 1;
        for(MyObject obj : objects) {
            p.setString(i++, obj.getValue());
            p.setString(i++, obj.getType());
            p.setString(i++, obj.getId());
            p.setDate(i++, new Date(obj.getRecordDate().getTime()));
        }
        p.execute();
        p.close();
        return true;
    } catch (SQLException e) {
        e.printStackTrace();
        return false;
    }
}
Josh Wortman
  • 99
  • 1
  • 4
  • This put an end to hours of struggling. Thanks! – tonyd Mar 01 '15 at 03:11
  • 3
    using rewriteBatchedStatements=true in the connection url string does the same thing ! – DebashisDeb May 23 '18 at 07:44
  • On databases that support batch inserts, that the JDBC driver has been written to support the RDBMS functionality, the driver leverages the underlying wire protocol to execute one prepared statement against the connection with the attached data buffer containing all of the rows supplied. The individual SQL per row is a fallback for database+driver combinations that do not support batch inserts. – pojo-guy Sep 26 '18 at 13:39
  • There is maximum number of parameters that can be generated this way, though. For instance, when I tried this approach using SQL Server, I could only set parameters up to 2100 values. – mareck_ste Jun 01 '21 at 13:08
-1

There is a commercial solution from Progress DataDirect to translate JDBC batches into the database's native bulk load protocol to significantly improve performance. It's very popular with SQL Server since it does not require BCP. I am employed by that vendor and wrote a blog on how to bulk insert JDBC batches.

Sumit Sarkar
  • 312
  • 1
  • 5