I'm currently facing the well-known and common Hibernate insert batch problem.
I need to save batches 5 millions of rows long. I'm first trying with a much lighter payload. Since I have to insert entities of only 2 types (first all records of type A, then all records of type B, all pointing to common type C ManyToOne
parent), I would like to take the most advantage from JDBC batch insert.
I have already read lots of documentation, but none that I have tried worked.
- I know that in order to use batch inserts I must not use an entity generator. So I removed the
AUTO_INCREMENT
ID and I'm setting the ID with a trick:SELECT MAX(ID) FROM ENTITIES
and increment every time. - I know that I must flush the session regularly. I'll post code ahead, but anyway I perform a transaction every 500 elements.
- I know that I have to set
hibernate.jdbc.batch_size
consistent with my application's bulk size, so I set it in theLocalSessionFactoryBean
(Spring ORM integration) - I know I have to enable rewriting batched statements in connection URL.
Here are my entities
Common parent entity. This gets inserted first in a single transaction. I don't care about auto increment column here. Only one record per batch job
@Entity
@Table(...)
@SequenceGenerator(...)
public class Deal
{
@Id
@Column(
name = "DEAL_ID",
nullable = false)
@GeneratedValue(
strategy = GenerationType.AUTO)
protected Long id;
................
}
One of the children (let's say 2.5M records per batch)
@Entity
@Table(
name = "TA_LOANS")
public class Loan
{
@Id
@Column(
name = "LOAN_ID",
nullable = false)
protected Long id;
@ManyToOne(
optional = false,
targetEntity = Deal.class,
fetch = FetchType.LAZY)
@JoinColumn(
name = "DEAL_ID",
nullable = false)
protected Deal deal;
.............
}
The other children type. Let's say the other 2.5M records
@Entity
@Table(
name = "TA_BONDS")
public class Bond
{
@Id
@Column(
name = "BOND_ID")
@ManyToOne(
fetch = FetchType.LAZY,
optional = false,
targetEntity = Deal.class)
@JoinColumn(
name = "DEAL_ID",
nullable = false,
updatable = false)
protected Deal deal;
}
Simplified code that inserts records
long loanIdCounter = loanDao.getMaxId(), bondIdCounter = bondDao.getMaxId(); //Perform SELECT MAX(ID)
Deal deal = null;
List<Bond> bondList = new ArrayList<Bond>(COMMIT_BATCH_SIZE); //500 constant value
List<Loan> loanList = new ArrayList<Loan>(COMMIT_BATCH_SIZE);
for (String msg: inputStreamReader)
{
log.debug(msg.toString());
if (this is a deal)
{
Deal deal = parseDeal(msg.getMessage());
deal = dealManager.persist(holder.deal); //Called in a separate transaction using Spring annotation @Transaction(REQUIRES_NEW)
}
else if (this is a loan)
{
Loan loan = parseLoan(msg.getMessage());
loan.setId(++loanIdCounter);
loan.setDeal(deal);
loanList.add(loan);
if (loanList.size() == COMMIT_BATCH_SIZE)
{
loanManager.bulkInsert(loanList); //Perform a bulk insert in a single transaction, not annotated but handled manually this time
loanList.clear();
}
}
else if (this is a bond)
{
Bond bond = parseBond(msg.getMessage());
bond.setId(++bondIdCounter);
bond.setDeal(deal);
bondList.add(bond);
if (bondList.size() == COMMIT_BATCH_SIZE) //As above
{
bondManager.bulkInsert(bondList);
bondList.clear();
}
}
}
if (!bondList.isEmpty())
bondManager.bulkInsert(bondList);
if (!loanList.isEmpty())
loanManager.bulkInsert(loanList);
//Flush remaining items, not important
Implementation of bulkInsert
:
@Override
public void bulkInsert(Collection<Bond> bonds)
{
// StatelessSession session = sessionFactory.openStatelessSession();
Session session = sessionFactory.openSession();
try
{
Transaction t = session.beginTransaction();
try
{
for (Bond bond : bonds)
// session.persist(bond);
// session.insert(bond);
session.save(bond);
}
catch (RuntimeException ex)
{
t.rollback();
}
finally
{
t.commit();
}
}
finally
{
session.close();
}
}
As you can see from comments, I have tried several combinations of stateful/stateless session
. None worked.
My dataSource
is a ComboPooledDataSource
with following URL
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true" />
My SessionFactory
<b:bean id="sessionFactory" class="class.that.extends.org.springframework.orm.hibernate3.LocalSessionFactoryBean" lazy-init="false" depends-on="dataSource">
<b:property name="dataSource" ref="phoenixDataSource" />
<b:property name="hibernateProperties">
<b:props>
<b:prop key="hibernate.dialect">${hibernate.dialect}</b:prop> <!-- MySQL5InnoDb-->
<b:prop key="hibernate.show_sql">${hibernate.showSQL}</b:prop>
<b:prop key="hibernate.jdbc.batch_size">500</b:prop>
<b:prop key="hibernate.jdbc.use_scrollable_resultset">false</b:prop>
<b:prop key="hibernate.cache.use_second_level_cache">false</b:prop>
<b:prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</b:prop>
<b:prop key="hibernate.cache.use_query_cache">false</b:prop>
<b:prop key="hibernate.validator.apply_to_ddl">false</b:prop>
<b:prop key="hibernate.validator.autoregister_listeners">false</b:prop>
<b:prop key="hibernate.order_inserts">true</b:prop>
<b:prop key="hibernate.order_updates">true</b:prop>
</b:props>
</b:property>
</b:bean>
Even if my project-wide class extends LocalSessionFactoryBean
, it does not override its methods (only adds few project-wide methods)
I'm getting mad since a few days. I read a few articles and none helped me enable batch inserts. I run all of my code from JUnit tests instrumented with Spring context (so I can @Autowire
my classes). All of my attempts only produce a lots of separate INSERT
statements
- https://stackoverflow.com/questions/12011343/how-do-you-enable-batch-inserts-in-hibernate
- https://stackoverflow.com/questions/3469364/faster-way-to-batch-saves-with-hibernate
- https://forum.hibernate.org/viewtopic.php?p=2374413
- https://stackoverflow.com/questions/3026968/high-performance-hibernate-insert
What am I missing?