5

When I tried to upsert test data(1,000 entities), it took 1m 5s.

So I read many articles, and then I reduce processing time to 20 seconds.

But it's still slow to me and I believe there are more good solutions than methods that I used. Does any one have a good practice to handle that?

I'm also wondering which part makes it slow?

  1. Persistence Context
  2. Additional Select

Thank you!


@Entity class

This entity class is to collect to user's walk step of health data from user's phone.

The PK is userId and recorded_at (recorded_at of the PK is from request data)

@Getter
@NoArgsConstructor
@IdClass(StepId.class)
@Entity
public class StepRecord {
    @Id
    @ManyToOne(targetEntity = User.class, fetch = FetchType.LAZY)
    @JoinColumn(name = "user_id", referencedColumnName = "id", insertable = false, updatable = false)
    private User user;

    @Id
    private ZonedDateTime recordedAt;

    @Column
    private Long count;

    @Builder
    public StepRecord(User user, ZonedDateTime recordedAt, Long count) {
        this.user = user;
        this.recordedAt = recordedAt;
        this.count = count;
    }
}

Id class

user field in Id class(here), it's UUID type. In Entity class, user is User Entity type. It works okay, is this gonna be a problem?

@NoArgsConstructor
@AllArgsConstructor
@EqualsAndHashCode
public class StepId implements Serializable {
    @Type(type = "uuid-char")
    private UUID user;
    private ZonedDateTime recordedAt;
}

Sample of Request Data

// I'll get user_id from logined user
// user_id(UUID) like 'a167d363-bfa4-48ae-8d7b-2f6fc84337f0'

[{
    "count": 356,
    "recorded_at": "2020-09-16T04:02:34.822Z"
},
{
    "count": 3912,
    "recorded_at": "2020-09-16T08:02:34.822Z"
},
{
    "count": 8912,
    "recorded_at": "2020-09-16T11:02:34.822Z"
},
{
    "count": 9004,
    "recorded_at": "2020-09-16T11:02:34.822Z" // <-- if duplicated, update
}
]

Sample of DB data


|user_id (same user here)            |recorded_at        |count|
|------------------------------------|-------------------|-----|
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 04:02:34|356  | <-insert
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 08:21:34|3912 | <-insert
|a167d363-bfa4-48ae-8d7b-2f6fc84337f0|2020-09-16 11:02:34|9004 | <-update


Solution 1 : SaveAll() with Batch

  1. application.properties
spring:
  jpa:
    properties:
      hibernate:
        jdbc.batch_size: 20
        jdbc.batch_versioned_data: true
        order_inserts: true
        order_updates: true
        generate_statistics: true
  1. Service
public void saveBatch(User user, List<StepRecordDto.SaveRequest> requestList) {
        List<StepRecord> chunk = new ArrayList<>();

        for (int i = 0; i < requestList.size(); i++) {
            chunk.add(requestList.get(i).toEntity(user));

            if ( ((i + 1) % BATCH_SIZE) == 0 && i > 0) {
                repository.saveAll(chunk);
                chunk.clear();
                //entityManager.flush(); // doesn't help
                //entityManager.clear(); // doesn't help 
            }
        }

        if (chunk.size() > 0) {
            repository.saveAll(chunk);
            chunk.clear();
        }
    }

I read the article that says if I add '@Version' field in Entity class, but it still additional selects. and it took almost the same time (20s).

link here ⇒ https://persistencelayer.wixsite.com/springboot-hibernate/post/the-best-way-to-batch-inserts-via-saveall-iterable-s-entities

but it doesn't help me. I think I pass the PK key with data, so It always call merge().

(If I misunderstood about @Version, please tell me)


Solution 2 : Mysql Native Query (insert into~ on duplicate key update~)

I guess Insert into ~ on duplicate key update ~ in mysql native query is may faster than merge() <- select/insert

mysql native query may also select for checking duplicate key but I guess mysql engine is optimized well.

  1. Repository
public interface StepRecordRepository extends JpaRepository<StepRecord, Long> {
    @Query(value = "insert into step_record(user_id, recorded_at, count) values (:user_id, :recorded_at, :count) on duplicate key update count = :count", nativeQuery = true)
    void upsertNative(@Param("user_id") String userId, @Param("recorded_at") ZonedDateTime recorded_at, @Param("count") Long count);
}
  1. Service
public void saveNative(User user, List<StepRecordDto.SaveRequest> requestList) {
        requestList.forEach(x ->
                repository.upsertNative(user.getId().toString(), x.getRecordedAt(), x.getCount()));
    }

Both of two method took 20s for 1,000 entities.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
hynuah_iia
  • 429
  • 7
  • 14

1 Answers1

4

Answered myself, but I still wait for your opinion.

Time to upsert to use native query

  • 1,000 entities => 0.8 seconds
  • 10,000 entities => 2.5 ~ 4.2 seconds

This is faster than the above two methods in the question. This is because data is stored directly in DB without going through persistence context.

pros

  • don't additional select
  • don't need to consider about Persistence Context

cons

  • unreadable?
  • too raw?

How to

Service

@RequiredArgsConstructor
@Service
public class StepRecordService {
    private final StepRecordRepository repository;

    @Transactional
    public void save(User user, List<StepRecordDto.SaveRequest> requestList) {
        int chunkSize = 100;
        Iterator<List<StepRecordDto.SaveRequest>> chunkList = StreamUtils.chunk(requestList.stream(), chunkSize);
        chunkList.forEachRemaining(x-> repository.upsert(user, x));
    }
}

chunk function in StreamUtils

public class StreamUtils {
    public static <T> Iterator<List<T>> chunk(Stream<T> iterable, int chunkSize) {
        AtomicInteger counter = new AtomicInteger();
        return iterable.collect(Collectors.groupingBy(x -> counter.getAndIncrement() / chunkSize))
                .values()
                .iterator();
    }
}

Repository

@RequiredArgsConstructor
public class StepRecordRepositoryImpl implements StepRecordRepositoryCustom {
    private final EntityManager entityManager;

      @Override
    public void upsert(User user, List<StepRecordDto.SaveRequest> requestList) {
        String insertSql = "INSERT INTO step_record(user_id, recorded_at, count) VALUES ";
        String onDupSql = "ON DUPLICATE KEY UPDATE count = VALUES(count)";
        StringBuilder paramBuilder = new StringBuilder();

          for ( int i = 0; i < current.size(); i ++ ) {
              if (paramBuilder.length() > 0)
                  paramBuilder.append(",");

              paramBuilder.append("(");
              paramBuilder.append(StringUtils.quote(user.getId().toString()));
              paramBuilder.append(",");
              paramBuilder.append(StringUtils.quote(requestList.get(i).getRecordedAt().toLocalDateTime().toString()));
              paramBuilder.append(",");
              paramBuilder.append(requestList.get(i).getCount());
              paramBuilder.append(")");
          }

          Query query = entityManager.createNativeQuery(insertSql + paramBuilder + onDupSql);
          query.executeUpdate();
    }
}
hynuah_iia
  • 429
  • 7
  • 14