I have 270000 records in a CSV file with columns user_id, book_ISBN, and book_rating, I need to insert the records into a many-to-many table. I parsed the data with openCSV library and the result is a list.
public List<UserRatingDto> uploadRatings(MultipartFile file) throws IOException{
BufferedReader fileReader = new BufferedReader(new
InputStreamReader(file.getInputStream(), "UTF-8"));
List<UserRatingDto> ratings = new CsvToBeanBuilder<UserRatingDto>(fileReader)
.withType(UserRatingDto.class)
.withSeparator(';')
.withIgnoreEmptyLine(true)
.withSkipLines(1)
.build()
.parse();
return ratings;
}
There are no performance issues with this, it takes approximately 1 minute to parse. However, in order to insert these into a table, I need to fetch books and users from the DB in order to form the relationship, I tried to make the method async with @Async annotation, I tried parallel stream, I tried putting the objects into a stack and using saveAll() to bulk insert, but it still takes way too much time.
public void saveRatings(final MultipartFile file) throws IOException{
List<UserRatingDto> userRatingDtos = uploadRatings(file);
userRatingDtos.parallelStream().forEach(bookRating->{
UserEntity user = userRepository.findByUserId(bookRating.getUserId());
bookRepository.findByISBN(bookRating.getBookISBN()).ifPresent(book -> {
BookRating bookRating1 = new BookRating();
bookRating1.setRating(bookRating.getBookRating());
bookRating1.setUser(user);
bookRating1.setBook(book);
book.getRatings().add(bookRating1);
user.getRatings().add(bookRating1);
bookRatingRepository.save(bookRating1);
});
});
}
This is what I have now, is there anything I can change to make this faster?