0

Dear fellow programmers,

i have been given the task to update about 10 000 - 100 000 records in an Oracle 11g database EACH minute. The current state of those records are held in a global ArrayList so i don't need to SELECT all records on every update from the DB. A scheduler updates those records in the ArrayList at the beginning of each minute and then starts to update the records in the database.

I cannot change this fact, it is a customer requirement. To achieve high performance, those updates should be done by using the native batch update feature.

I am using a TomEE plume 7.0.2 application server with EclipseLink 2.6.3 (this version is included with TomEE).

Code:

@PersistenceContext(unitName = "MES_Tables")
private EntityManager em;

...

@Schedule(second="0", minute="*", hour="*", persistent=false)
public void startUpdate(){
    Query q = em.createNativeQuery(
    "UPDATE " + 
        "SCHEMA.PROPERTIES_GRP_CONT " + 
    "SET " + 
        "STRVAL = ? " + //<-- SQL-Param 
    "WHERE " + 
        "STATES_ID = 1 " +
        "AND PROPERTIES_ID = ? " + //<-- SQL-Param
        "AND PROPERTIES_GRP_ID = ?"); //<-- SQL-Param

    for(BatchInfo bi : biList){ 
        int rowsUpdated = q
        .setParameter(1, Long.toString(bi.getLifetime()))
        .setParameter(2, bi.getPropertiesId())
        .setParameter(3, bi.getBatchId())
        .executeUpdate();
    }
}

Unfortunately those updates are executed as single updates and no batching is happening. So 10 000 updates are taking about 40-50 seconds. To my understanding the EntityManager (em) should automatically create batch updates if you execute multiple updates within a single for each loop. Even simplifying the SQL UPDATE to a statement without any parameters, so that always the same update is executed, did not change the fact that single updates were executed.

persistence.xml

<?xml version="1.0" encoding="UTF-8"?>
<persistence version="2.1"
    xmlns="http://xmlns.jcp.org/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/persistence http://xmlns.jcp.org/xml/ns/persistence/persistence_2_1.xsd">
    <persistence-unit name="MES_Tables" transaction-type="JTA">
        <jta-data-source>MES_Connection</jta-data-source>
        <exclude-unlisted-classes>false</exclude-unlisted-classes>
        <properties>
            <property name="javax.persistence.schema-generation.database.action" value="none" />
            <property name="eclipselink.ddl-generation" value="none" />
            <property name="eclipselink.logging.level" value="WARNING" />
            <property name="eclipselink.logging.level.sql" value="FINE" />
            <property name="eclipselink.logging.parameters" value="true" />

            <property name="javax.persistence.query.timeout" value="1800000" />
            <property name="eclipselink.jdbc.connections.wait-timeout" value="1800000" />
            <property name="eclipselink.jdbc.batch-writing" value="JDBC" />
            <property name="eclipselink.jdbc.batch-writing.size" value="600" />

            <property name="eclipselink.logging.logger" value="mes.core.logging.EclipseLinkLogger"/>
        </properties>
    </persistence-unit>
</persistence>

To test if batch updating is working at all, i refactored the code to use a managed JPA entity instead of the native SQL UPDATE. The problem here is, that i need to perform a em.merge(entity) on each entity for it to be managed again. This is because the entities become unmanaged after committing (which is happening each minute in the scheduler).

This causes 10 000 slow SELECTs (30-40 seconds). After those SELECTs are finished, EclipseLink performs a fast batch update (3-4 seconds).

The last days i was trying to prevent EclipseLink from performing those SELECTs and just issue the update but without luck. From another stackoverflow post i found a method to do updates without the SELECT:

Perform UPDATE without SELECT in eclipselink

    EntityManagerImpl emImpl = ((EntityManagerImpl) em.getDelegate());
    UnitOfWork uow = emImpl.getUnitOfWork();
    AbstractSession as = uow.getParent();

    for(BatchInfo bi : biList)
        as.updateObject(bi);

This unfortunately did not work also because of the following exception: org.eclipse.persistence.internal.sessions.IsolatedClientSession cannot be cast to org.eclipse.persistence.internal.sessions.UnitOfWorkImpl

I am out of options now and hopefully someone of you can give me a hint where to look at and solve this problem. It would be greatly appreciated.

I would rather have the native batch update working than the manipulating EclipseLink to not perform any SELECTs on merge.

Markus P.
  • 31
  • 7
  • JPA is forced to execute your statement one by one on each updateObject call, so it can't collect them into larger batches. You should look at changing your operation so that the scheduler uses the same EntityManager instance for reads and updates, avoiding unnecessary selects. Otherwise if you MUST use a native SQL for this operation (and have it separate from the reads), JPA might not be right for you and you'll want to get the connection and manage the batch statement execution directly. – Chris Aug 16 '17 at 14:30
  • Thank you for your comment @Chris. I have a question regarding your comment: If the scheduler uses the same EntityManager then the entities will get detached after a commit and i will need to reselect or merge them again on the next scheduler call? My intention is to read data from the db only once and then just update those records every minute without selecting or merging them first. – Markus P. Aug 16 '17 at 15:34
  • you can look into using an extended EntityManager context, and keeping the EntityManager around. This EntityManager will keep everything read into it managed, allowing you to attach it to a transaction to pick up changes made on those managed entities. Also note EclipseLink uses a shared cache, which can prevent the need for reads on merges in some cases. – Chris Aug 16 '17 at 16:37

1 Answers1

3

After searching for a long time and trying different approaches (thanks to Chris) i found the simplest solution if you want to stay on the native side of JPA:

@Schedule(second="0", minute="*", hour="*", persistent=false)
public void startUpdate(){
    String updateSql =
    "UPDATE " + 
        "SCHEMA.PROPERTIES_GRP_CONT " + 
    "SET " + 
        "STRVAL = ? " + //<-- SQL-Param 
    "WHERE " + 
        "STATES_ID = 1 " +
        "AND PROPERTIES_ID = ? " + //<-- SQL-Param
        "AND PROPERTIES_GRP_ID = ?"; //<-- SQL-Param

    java.sql.Connection connection = em.unwrap(java.sql.Connection.class);
    PreparedStatement prepStatement = connection.prepareStatement(updateSql);

    for(BatchInfo bi : biList){ 
        prepStatement.setString(1, Long.toString(bi.getLifetime()));
        prepStatement.setLong(2, bi.getPropertiesId());
        prepStatement.setLong(3, bi.getBatchId());

        prepStatement.addBatch();
    }

    prepStatement.executeBatch();
}

Warning: the important part (em.unwrap) may be EclipseLink specific and require JPA 2.1 or higher!

Markus P.
  • 31
  • 7