When I tried to upsert test data(1,000 entities), it took 1m 5s.
So I read many articles, and then I reduce processing time to 20 seconds.
But it's still slow to me and I believe there are more good solutions than methods that I used. Does any one have a good practice to handle that?
I'm also wondering which part makes it slow?
- Persistence Context
- Additional Select
Thank you!
@Entity class
This entity class is to collect to user's walk step of health data from user's phone.
The PK is userId
and recorded_at
(recorded_at
of the PK is from request data)
@Getter
@NoArgsConstructor
@IdClass(StepId.class)
@Entity
public class StepRecord {
@Id
@ManyToOne(targetEntity = User.class, fetch = FetchType.LAZY)
@JoinColumn(name = "user_id", referencedColumnName = "id", insertable = false, updatable = false)
private User user;
@Id
private ZonedDateTime recordedAt;
@Column
private Long count;
@Builder
public StepRecord(User user, ZonedDateTime recordedAt, Long count) {
this.user = user;
this.recordedAt = recordedAt;
this.count = count;
}
}
Id class
user field in Id class(here)
, it's UUID type. In Entity class
, user is User Entity type. It works okay, is this gonna be a problem?
@NoArgsConstructor
@AllArgsConstructor
@EqualsAndHashCode
public class StepId implements Serializable {
@Type(type = "uuid-char")
private UUID user;
private ZonedDateTime recordedAt;
}
Sample of Request Data
// I'll get user_id from logined user
// user_id(UUID) like 'a167d363-bfa4-48ae-8d7b-2f6fc84337f0'
[{
"count": 356,
"recorded_at": "2020-09-16T04:02:34.822Z"
},
{
"count": 3912,
"recorded_at": "2020-09-16T08:02:34.822Z"
},
{
"count": 8912,
"recorded_at": "2020-09-16T11:02:34.822Z"
},
{
"count": 9004,
"recorded_at": "2020-09-16T11:02:34.822Z" // <-- if duplicated, update
}
]
Sample of DB data
|user_id (same user here) |recorded_at |count|
|------------------------------------|-------------------|-----|
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 04:02:34|356 | <-insert
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 08:21:34|3912 | <-insert
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 11:02:34|9004 | <-update
Solution 1 : SaveAll() with Batch
- application.properties
spring:
jpa:
properties:
hibernate:
jdbc.batch_size: 20
jdbc.batch_versioned_data: true
order_inserts: true
order_updates: true
generate_statistics: true
- Service
public void saveBatch(User user, List<StepRecordDto.SaveRequest> requestList) {
List<StepRecord> chunk = new ArrayList<>();
for (int i = 0; i < requestList.size(); i++) {
chunk.add(requestList.get(i).toEntity(user));
if ( ((i + 1) % BATCH_SIZE) == 0 && i > 0) {
repository.saveAll(chunk);
chunk.clear();
//entityManager.flush(); // doesn't help
//entityManager.clear(); // doesn't help
}
}
if (chunk.size() > 0) {
repository.saveAll(chunk);
chunk.clear();
}
}
I read the article that says if I add '@Version' field in Entity class, but it still additional selects. and it took almost the same time (20s).
but it doesn't help me. I think I pass the PK key with data, so It always call merge().
(If I misunderstood about @Version, please tell me)
Solution 2 : Mysql Native Query (insert into~ on duplicate key update~)
I guess Insert into ~ on duplicate key update ~
in mysql native query is may faster than merge() <- select/insert
mysql native query may also select for checking duplicate key but I guess mysql engine is optimized well.
- Repository
public interface StepRecordRepository extends JpaRepository<StepRecord, Long> {
@Query(value = "insert into step_record(user_id, recorded_at, count) values (:user_id, :recorded_at, :count) on duplicate key update count = :count", nativeQuery = true)
void upsertNative(@Param("user_id") String userId, @Param("recorded_at") ZonedDateTime recorded_at, @Param("count") Long count);
}
- Service
public void saveNative(User user, List<StepRecordDto.SaveRequest> requestList) {
requestList.forEach(x ->
repository.upsertNative(user.getId().toString(), x.getRecordedAt(), x.getCount()));
}
Both of two method took 20s for 1,000 entities.