2

Colleagues, I would appreciate a lot advice on the following case.

Our project is based on Spring Data JPA - hence my repository implementation is based on SimpleJpaRepository.

Method to be discussed is in the service marked with @Transactional.
To my understanding the Spring creates Entity Manager, flushes data and commits transactions if I do not interpose into the process.

The project reads and parses external json twice. First run - json is given to fill the table. Second run - json of the same size with some new values here and there is provided to update the table.

Table has a UNIQUE index on the search field used for update. Data Object is basic with no @OneToMany relations.

The Problem: there is growing drastic decrease in speed while running for the 2nd time (getting update). Every new processed, say, 1000 records gets processed slower than the previous one. As the result update-run takes approx 10 times longer than the create-run

For the create-run I used straightforward repository method #save which chooses between #persist and #merge, nothing more. Evidently it chooses #persist in my case. In all probability data are flushed and transaction is commited by Spring. I turned on the 'generate_statistics' option and there is 1 flush and number of entities created as expected

How I tried to speed up update:

Firstly, for the update-run I sliced the data to be processed into collections (actually the one which is cleared on each slice processing end) and called first #saveAll then #flush (which in fact is em#flush) This approach is based on these discussions How to improve performance of Updating data using JPA and HIbernate commit() and flush()

Alas, the time spend was practically the same, number of JDBC operations was the same, number of flushes was as expected (say, 29 flushes when 'pack' size was 1000, 2900 flushes when 'pack' size was 10). Strange though that number of entities this time was different from number of the records in table to be updated

The log looks like

76545093741 nanoseconds spent executing 2860 flushes (flushing a total of 40912292 entities and 0 collections);
756096912142 nanoseconds spent executing 28592 partial-flushes (flushing a total of 408736936 entities and 408736936 collections)

40912292 entities? 408736936 entities and collections? But why? I also wonder what are those partial-flushes - what incurs them? why their number floats?

I wonder why manual periodic flushes did not help.

Secondly, on the previous attempt I used Data Object with primary key auto-generated with IDENTITY strategy.

This time I decided to try batch processing. I changed PK generation strategy to SEQUENCE and added bunch of the Spring properties for batch processing:

jpa:
    properties:
      hibernate:
        jdbc:
          batch_size: 50
          batch_versioned_data: true
          order_inserts: true
          order_updates: true

What log I received in this case:

250614501 nanoseconds spent preparing 28594 JDBC statements;
8759177291 nanoseconds spent executing 28592 JDBC statements;
3398281 nanoseconds spent executing 2 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
7925542816 nanoseconds spent executing 286 flushes (flushing a total of 4104092 entities and 0 collections);
794086157441 nanoseconds spent executing 28592 partial-flushes (flushing a total of 408736936 entities and 408736936 collections)

So only 2 batches ... and practically no gain in speed

Evidently smth is wrong, may be wrongly configured. Can I fix it somehow? Is there some way to increase the update speed?

  1. And final ... and probably the most important attempt that I tested.

After create-run when transaction is completed I thought that entity gets detached and needs to be merged (they state that here: Does JPA's commit() method make entity detached?) I even restarted the Jetty. The only thing my update code did was to set a new value during update-run. That new value was magically transferred to DB without call to repository method saveAndFlash (i.e. entitymanager.merge) :) Alas, no gain in processing speed though ...

1 Answers1

1

As no one proposed any solution let me tell what helped me in the end

I injected the following into the service class:

@PersistenceContext
private EntityManager entityManager;

and called

entityManager.clear();

after every 1000 records