0

In my java 8 spring boot application, I have a list of 40000 records. For each record, I have to call an external API and save the result to DB. How can I do this with better performance within no time? Each of the API calls will take about 20 secs to complete. I used a parallel stream for reducing the time but there was no considerable change in it.

if (!mainList.isEmpty()) {
    AtomicInteger counter = new AtomicInteger();
    List<List<PolicyAddressDto>> secondList = 
            new ArrayList<List<PolicyAddressDto>>(
                    mainList.stream()
                        .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / subArraySize))
                        .values());
    for (List<PolicyAddressDto> listOfList : secondList) {
        listOfList.parallelStream()
                .forEach(t -> {
                    callAtheniumData(t, listDomain1, listDomain2); // listDomain2 and listDomain1 declared
                                                                    // globally
                });
        
        if (!listDomain1.isEmpty()) {
            listDomain1Repository.saveAll(listDomain1);
        }
        if (!listDomain2.isEmpty()) {
            listDomain2Repository.saveAll(listDomain2);
        }
    }
}
ernest_k
  • 44,416
  • 5
  • 53
  • 99
  • 1
    The first thing for you is: you have to understand where exactly time is spent. Meaning: you have to use a profiling tool for example to identify what exactly is going on. You see, maybe you need to write custom parallelisation to use a higher number of threads for these API calls. (where: i would check if you cant create a "bulk API" call, that you can call less often, just passing more data to it in one shot). Fixing performance issues **always** requires *context* ... and that, we do not have. It is your setup. You can make the necessary experiments. – GhostCat Oct 20 '20 at 11:04
  • 1
    This can be achieved using the project reactor. Spring provides fantastic support using Spring WebFlux. https://github.com/reactor/reactor-core#head-first-spring--reactor – Aniket Sahrawat Oct 20 '20 at 11:04
  • 1
    Assuming that the 20 seconds are mostly spent in the API calls, as you state, you can indeed gain speed in doing this parallel, assuming that the API calls can be handled in parallel without slowing each other down. – TreffnonX Oct 20 '20 at 11:06
  • 1
    Put some logging timestamps into your code and you'll roughly see, what part is the one that is slowing the process down. If it is the API call, then measure it again and maybe you will find, that sending more parallel requests will slow down the target system. If it takes 20 seconds to complete, then the target API may have it's own performance problems and parallelization may even slow down the overall performance. – Michal Krasny Oct 20 '20 at 11:12
  • 1
    What is the point of the first grouping operation? Why don’t you just use `mainList .parallelStream() .forEach(t -> callAtheniumData(t, listDomain1, listDomain2));` in the first place? – Holger Oct 20 '20 at 13:01

3 Answers3

1

Solving a problem in parallel always involves performing more actual work than doing it sequentially. Overhead is involved in splitting the work among several threads and joining or merging the results. Problems like converting short strings to lower-case are small enough that they are in danger of being swamped by the parallel splitting overhead.

As I can see the api call response is not being saved. Also all api calls are disjoint with respect to each other.

Can we try creating new threads for each api call.

for (List<PolicyAddressDto> listOfList : secondList) {
            listOfList.parallelStream()
                    .forEach(t -> {
                        new Thread(() ->{callAtheniumData(t, listDomain1, listDomain2)}).start(); 
                    });
    }
Ayush v
  • 351
  • 2
  • 9
0

That's because the parallel stream divide the task usually creating one thread per core -1. If every call you do to the external API takes 20 seconds and you have 4 core, this means 3 concurrent requests that wait for 20 seconds.

You can increase the concurrency of your calls in this way https://stackoverflow.com/a/21172732/574147 but I think you're just moving the problems.

An API that takes 20sec it's a really slow "typical" response time. If this is a really complex elaboration and CPU bounded, how can that service be able to respond at 10 concurrent request keeping the same performance? Probably it wouldn't.

Otherwise if the elaboration is "IO bounded" and takes 20 seconds, you probably need a service able to take (and work!) with list of elements

Fabio Bonfante
  • 5,128
  • 1
  • 32
  • 37
0

Each of the API calls will take about 20 secs to complete.

Your external API is where you are being bottlenecked. There's really nothing your code can do to speed it up on the client side except to parallelize the process. You've already done that, so if the external API is within your organization, you need to look into any performance improvements there. If not, can do something like offload the processing via Kafka to Apache NiFi or Streamsets so that your Spring Boot API doesn't have to wait for hours to process the data.

Mike Thomsen
  • 36,828
  • 10
  • 60
  • 83