3

Possible Duplicate:
JDBC batch insert performance

I have the following method in a class:

    public void insertShingleSets(Vector<ShingleSet> shingleSets)
    {
        String sql = "INSERT   INTO tblPostingsShingles("+
                "rowId, " +                    
                "shingle) " +                     
                "VALUES(?,?);";

        PreparedStatement statement = null;
        try {
            statement = conn.prepareStatement(sql);
            for (int i = 0; i < shingleSets.size(); i++)
            {   String id = shingleSets.get(i).getRowId();
                String shingle = shingleSets.get(i).getShingle();
                statement.setInt(1, Integer.parseInt(id));
                statement.setString(2, shingle);
                statement.addBatch();
                if ((i + 1) % 1000 == 0) {
                    System.out.println("doing a batch " + i);         //-------------
                    statement.executeBatch(); // Execute every 1000 items.
                    System.out.println("done batch " + i);            //-------------
                             }
            }
            statement.executeBatch();
        }
       catch (SQLException ex)
            {
                // handle any errors
                System.out.println("SQLException: " + ex.getMessage());
                System.out.println("SQLState: " + ex.getSQLState());
                System.out.println("VendorError: " + ex.getErrorCode());
            }

        finally {
            if (statement != null)
                try { statement.close();
                } catch (SQLException logOrIgnore) {}
            if (conn != null)
                try { conn.close();
                } catch (SQLException logOrIgnore) {}
        }
    }

The time between executing: System.out.println("doing a batch " + i); and executing: System.out.println("done batch " + i); is about 30 seconds which seems rather a lot considering it's only inserting two columns into a three column table (the other column is an autonumber primary key and when starting/testing there were no rows in the table). The only thing I can think of is that before this method is called, another method uses a query that checks in the table tblPostingsShingles to check if certain rowIds are there. However, I would have thought that any locks would be released when that method finishes (it has the same finally clause as this method). Any advice would be greatly appreciated. Dave

Community
  • 1
  • 1
user725687
  • 67
  • 1
  • 8

2 Answers2

1

You can experiment with batch sizes, but I've had processes that seemed quite fast at 500 a batch and 1000 is not overly large - there's no obvious reason that I can see that that would take so long - it's easily an order of magnitude too slow.

Other possible performance bottlenecks are your mysql configuration and network connection speed. Is this speed any faster than doing a single write at a time?

Steve B.
  • 55,454
  • 12
  • 93
  • 132
  • I'm a bit new to mysql. However it's installed on the same computer (i.e. localserver). I would have thought that something would appear in the mysql.slow_log but it's empty. I haven't knowingly altered any of the configurations but is there a way I can check these? It just seems ridiculously slow. I presume network connection would not be applicable here due to localserver?? I timed how long it took to insert one row and calculated it as 70702840 nanoseconds which is 0.07 of a second. Isn't that quite a bit? Thanks for any advice – user725687 Aug 10 '11 at 18:48
  • On your local box that seems very slow, unless your local box is a circa-1990 ibm pc-junior. I don't think the slow-log will show inserts, though I wouldn't swear to it. Also try turning off autocommit in the driver before batch processing and manually triggering the commit after. – Steve B. Aug 10 '11 at 22:48
0

I would suggest experimenting with a smaller batch size. Generally, when I do batch inserts I limit the size to 20 - 50 rows at a time. Right now with 1000 row batch, your job is likely blocking on database and network I/O. If you cut down the batch size you may be able to speed up the process.

jonathan.cone
  • 6,592
  • 2
  • 30
  • 30
  • Making batch smaller can improve time of processing a batch but how it will effect overall processing time? – Dmitriy R Aug 10 '11 at 00:13
  • sure, I reduced it from 1000 to 200 but it takes longer than a fifth of the time - i.e. still takes about 8-10 seconds to do 200 rows – user725687 Aug 10 '11 at 18:12
  • Yeah, that's pretty bad. I found this: http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html, you might want to check out the part about LOCK TABLES, would you be able to test with that? – jonathan.cone Aug 10 '11 at 20:46
  • Thanks Jonathan. I think it was my computer or something - using another one and it seems ok. – user725687 Aug 25 '11 at 19:57