3

I have a program which reads a text file line by line, and creates a Hibernate entity object from each line, and saves them. I have several such text files to process, each of which has about 300,000 lines. I'm finding that my current implementation is excruciatingly slow, and I'm wondering if there's anything I can do to improve things.

My main method processes the text file line by line like so:

// read the file line by line
FileInputStream fileInputStream = new FileInputStream(new File(fileName));
InputStreamReader inputStreamReader = new InputStreamReader(fileInputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
int lineCount = 0;
String line = bufferedReader.readLine();
while (line != null)
{
    // convert the line into an Observations object and persist it
    convertAndPersistObservationsLine(line);

    // if the number of lines we've processed has built up to the JDBC batch size then flush
    // and clear the session in order to control the size of Hibernate's first level cache
    lineCount++;
    if (lineCount % JDBC_CACHE_SIZE == 0)
    {
        observationsDao.flush();
        observationsDao.clear();
    }

    line = bufferedReader.readLine();
}

The convertAndPersistObservationsLine() method just splits the text line into tokens, creates a new entity object, populates the entity's fields with data from the tokens, and then saves the object via a DAO that calls Hibernate's Session.saveOrUpdate() method. The DAO methods flush() and clear() are direct calls to the corresponding Hibernate Session methods.

The Hibernate property 'hibernate.use_second_level_cache' is set to false, and the Hibernate property 'hibernate.jdbc.batch_size' is set to 50, as is the Java constant JDBC_CACHE_SIZE.

Can someone suggest a better way of going about this, or any tweaks to the above which may improve the performance of this batch loading program?

Thanks in advance for your help.

--James

James Adams
  • 8,448
  • 21
  • 89
  • 148
  • 2
    Do you have transactions properly set up, without autocommit etc? Does your dao not accidentaly flush the session on persist? Can you profile the code to see where most time is spent? Can you enable the sql log to verify that there are no intermediate flushes? – Alexander Torstling Aug 12 '10 at 16:31
  • Is it preferable that I set the Hibernate property 'hibernate.connection.autocommit' to false (and is this value true by default)? My DAO does not call flush() on persist, it only calls saveOrUpdate() before returning. I have not profiled the code yet nor monitored the log for intermediate flushes, thanks for these suggestions. – James Adams Aug 12 '10 at 16:47
  • Adam I would suggest taking a look at logs generated by hibernate and spring to see what is going on during the insertions; I think that either Spring or Hibernate disables autocommit during a transaction (making the setting irrelevant, but you should check to be sure). – matt b Aug 12 '10 at 18:10

3 Answers3

8

The code itself and the Hibernate configuration look correct (by correct I mean that they follow the batch insert idiom from the documentation). But here are some additional suggestions:

As already mentioned, make absolutely sure that you aren't using an ID generator that defeats batching like IDENTITY. When using GenerationType.AUTO, the persistence provider will pick an appropriate strategy depending on the database so, depending on your database, you might have to change that for a TABLE or SEQUENCE strategy (because Hibernate can cache the IDs using an hi-lo algorithm).

Also make sure that Hibernate is batching as expected. To do so, activate logging and monitor the BatchingBatcher to track the size of the batch it's executing (will be logged).

In your particular case, you might actually consider using the StatelessSession interface (once the problem will be solved of course).

Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
3

A few things:

  1. Can you quantify "excruciatingly slow"? How many inserts per second are you achieving? What rate do you think you should have instead? What type of load is the database itself under? Are others reading from the table at the same time?

  2. How are you connecting to the database? Is all of this occurring in a single transaction re-using the same connection?

  3. Are you by any chance using an identity identifier? The documentations states that JDBC batching is disabled silently if you are:

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

matt b
  • 138,234
  • 66
  • 282
  • 345
  • Thanks for the response, Matt. 1. It looks like I am getting roughly 4 inserts per second. Can I expect to dramatically improve this? 2. My DAO objects contain a SessionFactory object which is wired in via Spring dependency injection. The SessionFactory class I'm using is org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean. The transaction manager class I'm using is org.springframework.orm.hibernate3.HibernateTransactionManager and I have enabled transactional behavior based on annotations in my Spring application context config file. – James Adams Aug 12 '10 at 16:27
  • The DAO class has this as the transactional setting: @Transactional(propagation = Propagation.REQUIRED, readOnly = false, isolation = Isolation.DEFAULT) The SessionFactory has a DataSource property, and this is configured in Spring application context as well with normal settings for URL, user/password, driver class, etc. 3. I don't think I'm using an identity identifier, the ID field for my entity class looks like this: @Id @Column(name = "ENTITY_ID") @GeneratedValue(strategy = GenerationType.AUTO) public T getId() { return id; } – James Adams Aug 12 '10 at 16:33
  • I apologize for not formatting the above comments -- I tried using basic HTML and Markdown but was never able to get anything to take and it always ended up as one run-on string. – James Adams Aug 12 '10 at 16:39
1

If you are using MySQL, you might want to turn on rewriteBatchedStatements since MySQL does not support batch form of prepare statement parameter bindings. It will rewrite your insert statements to the form as "INSERT INTO YourEntity VALUES (...), (...), (...)".

Please refer to: http://shengchien.blogspot.com/2010/10/hibernate-batch-processing-with-mysql.html

Sheng Chien
  • 519
  • 4
  • 7