I have a long running (but rather simple) application that uses Hibernate (via JPA). It was experiencing rather dramatic slowdown as it ran. I've been able to narrow down to requiring an occasional entityManager.clear()
call. When Hibernate's entity manager is tracking 100,000 entities, it is ~100x slower than when it is tracking only a few (see results below). My question is: why does Hiberate slow down so much when it is tracking a lot of entities? And are there any other ways around it?
!!! Update: I've been able to narrow this down to Hibernate's auto-flushing code. !!!
Specifically to org.hibernate.event.internal.AbstractFlushingEventListener
's flushEntities()
method (at least in Hibernate 4.1.1.Final). In it there is a loop that iterates over ALL entities in the persistence context, performing some extensive checks around flushing each of them (even though all the entities are already flushed in my example!).
So partially answering the second part of my question, the performance problem can be addressed by setting the flush mode to FlushModeType.COMMIT
on the query (see updated results below). e.g.
Place place = em.createQuery("from Place where name = :name", Place.class)
.setParameter("name", name)
.setFlushMode(FlushModeType.COMMIT) // <-- yay!
.getSingleResult();
... but this seems like a rather ugly solution--passing the responsibility for knowing if things are flushed to the query methods instead of keeping it in the updating methods. It also pretty much means that I either have to set flush mode to COMMIT on all query methods, or, more likely, set it on the EntityManager.
This makes me wonder: is this expected behavior? Am I doing something wrong with flushing or how I define entities? Or is this a limitation of (or possibly bug in) Hibernate?
The sample code that I used to isolate the problem follows:
The Test Entity
@Entity @Table(name="place") @Immutable
public class Place {
private Long _id;
private String _name;
@Id @GeneratedValue
public Long getId() { return _id; }
public void setId(Long id) { _id = id; }
@Basic(optional=false) @Column(name="name", length=700,
updatable=false, nullable=false, unique=true,
columnDefinition="varchar(700) character set 'ascii' not null")
public String getName() { return _name; }
public void setName(String name) { _name = name; }
@Override
public boolean equals(Object o) { /* ... */ }
@Override
public int hashCode() { return getName().hashCode(); }
}
The Benchmark Code
The test code I have generates 100000 random place names and inserts them. Then queries out 5000 of those randomly by name. There is an index on the name column.
Place place = em.createQuery(
"select p from Place p where p.name = :name", Place.class)
.setParameter("name", name)
.getSingleResult();
For comparison, and to make sure that it wasn't something in the database, I ran the following JDBC based query (under em.unwrap(Session.class).doWork(...)
) over a separate randomly selected 5000 place names:
PreparedStatement ps = c.prepareStatement(
"select id, name from place where name = ?");
ps.setString(1, name);
ResultSet rs = ps.executeQuery();
while (rs.next()) {
Place place = new Place();
place.setId(rs.getLong(1));
place.setName(rs.getString(2));
}
rs.close();
ps.close();
(Note, I do create and close a PreparedStatement for each of the 5000 queries for the benchmark).
The Results
All results below are averaged over 5000 queries. The JVM was given -Xmx1G
Seconds/Query Approach
0.000160s JDBC
0.000286s Hibernate calling clear() after import and every 100 queries
0.000653s Hibernate calling clear() once after the import
0.012533s Hibernate w/o calling clear() at all
0.000292s Hibernate w/o calling clear(), and with flush-mode COMMIT
Other observations: the during the Hibernate queries (w/o any clear calls), the java process pegged a core at near 100% utilization. The JVM never exceeded 500MB heap. There was a lot of GC activity during the queries too, but CPU utilization was clearly dominated by the Hibernate code.