2

Assume I want to send multiple database queries (or webservice requests) in parallel, and aggregate them afterwards. Would I better use the stream API or CompletableFuture?

STREAM:

List<Result> result = requests.parallelStream()
                              .map(req -> query(req.getFirstname, req.getLastname))
                              .collect(toList());

//a database, or a webservice
private Result query(firstname, lastname);

FUTURES:

List<CompletableFuture> futures;
for (QueryReq req: requests) { //given input
    futures.add(CompletableFuture
                        .supplyAsync(() -> query(req.getFirstname, req.getLastname));
}

//wait for all futures to complete and collect the results
List<Result> results = new ArrayList<>();
for (CompleteableFuture f : futures) {
   results.add(f.get());
}

While stream is certainly less verbose, but which one should be preferred for what reasons?

Sidenote: I know I could query this example easier with sql = :firstname IN (..) and ... :lastname IN(..). But it's just an example about wether to use stream or futures.

The task could as well be to send multiple webservice requests in parallel, instead of db queries.

membersound
  • 81,582
  • 193
  • 585
  • 1,120
  • 2
    would you read this also? It might not work at all... https://stackoverflow.com/questions/44029856/using-jpa-objects-in-parallel-streams-with-spring – Eugene Sep 27 '17 at 11:00
  • I'd say broad and opinion based. The main beef is the same (things being run in parallel), and your example code even uses the common pool for both. I'd say it depends mainly on what other code exists. – Kayaman Sep 27 '17 at 11:02
  • @Eugene that of course depends on at what point you're binding the `EntityManager` to the thread. The given example doesn't show similar "thread-crossing" as the linked question. – Kayaman Sep 27 '17 at 11:08
  • 1
    The query is run with a `spring` `CrudRepository`, not directly on the entity manger. But that's a detail that should not matter, as written the query could as well be a webservice call without any db access at all. – membersound Sep 27 '17 at 11:53
  • @Eugene the linked question has nothing to do with this. I'm not running code like `*.findAll().parallel()`, which runs the parallel stream on the *ResultSet*. – membersound Sep 27 '17 at 11:55
  • @membersound yeah... so I've notice also. – Eugene Sep 27 '17 at 11:55
  • @Kayaman so if it is more or less the same, it's probably easier to stick with `parallelStream()`... – membersound Sep 27 '17 at 12:02

1 Answers1

1

As you already said: "stream is certainly less verbose", isn't it enough to prefer using Stream for you? To be fair, I think we also should rewrite the second sample code with CompletableFuture by Java 8 Stream APIs.

List<Result> result = requests.stream()
        .map(req -> CompletableFuture.supplyAsync(() -> query(req.getFirstname, req.getLastname)))
        .collect(toList()).stream()
        .map(f -> f.get()).collect(toList());

It looks it's still much verbose/longer than:

List<Result> result = requests.parallelStream()
        .map(req -> query(req.getFirstname, req.getLastname)).collect(toList());

However, I think here the key point is how to set the concurrent thread number: by parallel stream, the thread number is fixed by ForkJoinPool.commonPool, the CPU-core number. Usually that's too small for sending big amount web/db requests. For example, if there're tens/hundreds of web/db requests to send, Most times it's much faster to send the requests with 20 or more threads than the thread number defined in ForkJoinPool.commonPool. personally, I also don't know what's convenient way to specified the thread number in parallel stream. Here are some answers you can refer to: custom-thread-pool-in-java-8-parallel-stream

123-xyz
  • 619
  • 4
  • 5
  • So I think one can conclude: use `streams` if only a few parallel threads are required, and `CompletableFuture` for a larger amount of parallels + to control the pool size. – membersound Sep 28 '17 at 08:47
  • That's right. if the thread number provided in parallel stream is enough. there is no reason to not use parallel stream after all it's much easy/simple. Unfortunately, CompletableFuture: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletableFuture.html#runAsync-java.lang.Runnable- use ForkJoinPool.commonPool() as well. you need to find a way/answer in the question i post. – 123-xyz Sep 28 '17 at 18:37