Spring JPA bulk upserts is slow (1,000 entities took 20 seconds)

Question

When I tried to upsert test data(1,000 entities), it took 1m 5s.

So I read many articles, and then I reduce processing time to 20 seconds.

But it's still slow to me and I believe there are more good solutions than methods that I used. Does any one have a good practice to handle that?

I'm also wondering which part makes it slow?

Persistence Context
Additional Select

Thank you!

@Entity class

This entity class is to collect to user's walk step of health data from user's phone.

The PK is userId and recorded_at (recorded_at of the PK is from request data)

@Getter
@NoArgsConstructor
@IdClass(StepId.class)
@Entity
public class StepRecord {
    @Id
    @ManyToOne(targetEntity = User.class, fetch = FetchType.LAZY)
    @JoinColumn(name = "user_id", referencedColumnName = "id", insertable = false, updatable = false)
    private User user;

    @Id
    private ZonedDateTime recordedAt;

    @Column
    private Long count;

    @Builder
    public StepRecord(User user, ZonedDateTime recordedAt, Long count) {
        this.user = user;
        this.recordedAt = recordedAt;
        this.count = count;
    }
}

Id class

user field in Id class(here), it's UUID type. In Entity class, user is User Entity type. It works okay, is this gonna be a problem?

@NoArgsConstructor
@AllArgsConstructor
@EqualsAndHashCode
public class StepId implements Serializable {
    @Type(type = "uuid-char")
    private UUID user;
    private ZonedDateTime recordedAt;
}

Sample of Request Data

// I'll get user_id from logined user
// user_id(UUID) like 'a167d363-bfa4-48ae-8d7b-2f6fc84337f0'

[{
    "count": 356,
    "recorded_at": "2020-09-16T04:02:34.822Z"
},
{
    "count": 3912,
    "recorded_at": "2020-09-16T08:02:34.822Z"
},
{
    "count": 8912,
    "recorded_at": "2020-09-16T11:02:34.822Z"
},
{
    "count": 9004,
    "recorded_at": "2020-09-16T11:02:34.822Z" // <-- if duplicated, update
}
]

Sample of DB data


|user_id (same user here)            |recorded_at        |count|
|------------------------------------|-------------------|-----|
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 04:02:34|356  | <-insert
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 08:21:34|3912 | <-insert
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 11:02:34|9004 | <-update

Solution 1 : SaveAll() with Batch

application.properties

spring:
  jpa:
    properties:
      hibernate:
        jdbc.batch_size: 20
        jdbc.batch_versioned_data: true
        order_inserts: true
        order_updates: true
        generate_statistics: true

Service

public void saveBatch(User user, List<StepRecordDto.SaveRequest> requestList) {
        List<StepRecord> chunk = new ArrayList<>();

        for (int i = 0; i < requestList.size(); i++) {
            chunk.add(requestList.get(i).toEntity(user));

            if ( ((i + 1) % BATCH_SIZE) == 0 && i > 0) {
                repository.saveAll(chunk);
                chunk.clear();
                //entityManager.flush(); // doesn't help
                //entityManager.clear(); // doesn't help 
            }
        }

        if (chunk.size() > 0) {
            repository.saveAll(chunk);
            chunk.clear();
        }
    }

I read the article that says if I add '@Version' field in Entity class, but it still additional selects. and it took almost the same time (20s).

link here ⇒ https://persistencelayer.wixsite.com/springboot-hibernate/post/the-best-way-to-batch-inserts-via-saveall-iterable-s-entities

but it doesn't help me. I think I pass the PK key with data, so It always call merge().

(If I misunderstood about @Version, please tell me)

Solution 2 : Mysql Native Query (insert into~ on duplicate key update~)

I guess Insert into ~ on duplicate key update ~ in mysql native query is may faster than merge() <- select/insert

mysql native query may also select for checking duplicate key but I guess mysql engine is optimized well.

Repository

public interface StepRecordRepository extends JpaRepository<StepRecord, Long> {
    @Query(value = "insert into step_record(user_id, recorded_at, count) values (:user_id, :recorded_at, :count) on duplicate key update count = :count", nativeQuery = true)
    void upsertNative(@Param("user_id") String userId, @Param("recorded_at") ZonedDateTime recorded_at, @Param("count") Long count);
}

Service

public void saveNative(User user, List<StepRecordDto.SaveRequest> requestList) {
        requestList.forEach(x ->
                repository.upsertNative(user.getId().toString(), x.getRecordedAt(), x.getCount()));
    }

Both of two method took 20s for 1,000 entities.

Did you try to override equals() and hashCode() for your entity? — NikMashei, Sep 17 '20 at 10:12
And maybe this link will be useful for you: https://stackoverflow.com/questions/14936266/spring-data-jpa-manual-commit-transaction-and-restart-new-one — NikMashei, Sep 17 '20 at 10:14
I read your link. I already tried `entityManager.flush()` and `entityManager.clear()`, but it doesn't help :(. still 20 seconds — hynuah_iia, Sep 17 '20 at 10:20
Yes, they are in the same network, and have the same test data. — hynuah_iia, Sep 17 '20 at 16:12
equals() and hashCode() are the must for hibernate, this is how hibernate determines "identity" of entity — Bartun, Sep 17 '20 at 21:07
@Zogger Sorry, my comment was wrong. I override equals() and hashCode() using EqualsAndHashCode. I added my id class on the content. — hynuah_iia, Sep 18 '20 at 01:47
@Bartun thank you. I'll search about it more. I already added equals() and hashCode() on my Id class. in `Id class`, user is **UUID type**, and in `Entity class`, user is **User Entity**. is it okay? — hynuah_iia, Sep 18 '20 at 01:48

score 4 · Answer 1 · answered Sep 23 '20 at 13:54

Answered myself, but I still wait for your opinion.

Time to upsert to use native query

1,000 entities => 0.8 seconds
10,000 entities => 2.5 ~ 4.2 seconds

This is faster than the above two methods in the question. This is because data is stored directly in DB without going through persistence context.

pros

don't additional select
don't need to consider about Persistence Context

cons

unreadable?
too raw?

How to

Service

@RequiredArgsConstructor
@Service
public class StepRecordService {
    private final StepRecordRepository repository;

    @Transactional
    public void save(User user, List<StepRecordDto.SaveRequest> requestList) {
        int chunkSize = 100;
        Iterator<List<StepRecordDto.SaveRequest>> chunkList = StreamUtils.chunk(requestList.stream(), chunkSize);
        chunkList.forEachRemaining(x-> repository.upsert(user, x));
    }
}

chunk function in StreamUtils

public class StreamUtils {
    public static <T> Iterator<List<T>> chunk(Stream<T> iterable, int chunkSize) {
        AtomicInteger counter = new AtomicInteger();
        return iterable.collect(Collectors.groupingBy(x -> counter.getAndIncrement() / chunkSize))
                .values()
                .iterator();
    }
}

Repository

@RequiredArgsConstructor
public class StepRecordRepositoryImpl implements StepRecordRepositoryCustom {
    private final EntityManager entityManager;

      @Override
    public void upsert(User user, List<StepRecordDto.SaveRequest> requestList) {
        String insertSql = "INSERT INTO step_record(user_id, recorded_at, count) VALUES ";
        String onDupSql = "ON DUPLICATE KEY UPDATE count = VALUES(count)";
        StringBuilder paramBuilder = new StringBuilder();

          for ( int i = 0; i < current.size(); i ++ ) {
              if (paramBuilder.length() > 0)
                  paramBuilder.append(",");

              paramBuilder.append("(");
              paramBuilder.append(StringUtils.quote(user.getId().toString()));
              paramBuilder.append(",");
              paramBuilder.append(StringUtils.quote(requestList.get(i).getRecordedAt().toLocalDateTime().toString()));
              paramBuilder.append(",");
              paramBuilder.append(requestList.get(i).getCount());
              paramBuilder.append(")");
          }

          Query query = entityManager.createNativeQuery(insertSql + paramBuilder + onDupSql);
          query.executeUpdate();
    }
}