Based on your question I assume there is no ability to leverage the event-based/reactive approach, and architectural decision is already made with tradeoffs as considered here (note, in this source the approach proposed below is referenced as a 'Hybrid').
Orchestration
Under these conditions, the pattern you're looking for is called Orchestration. Check out this great answer for wider overview.
As a quick recap, you can use something like Spring Integration to implement the following key points:
- While processing the request to A, execute calls to B & C concurrently where possible to achieve quickest response time from A
- Accumulate, transform and aggregate the results of concurrent calls into complete response entity
- Leverage thread pools to limit concurrently-running requests to B and C to prevent amplification of cascading failures
- Fail fast: early cancel the subsequent bunch of calls if some of requests fails (i.e. do not call C if call to B was not successful)
- Cut-off: involve the maximal processing time you can wait for completion of currently-running bunch of calls to B & C and respond with error by A upon elapsed
Update - rely on implementation of Reactor pattern on client side
If you can use Spring 5/Spring Boot 2.x, you can also make the calls to B & C in a reactive way using Spring WebFlux based on Project Reactor to achieve above points.
Schematically, you can do something like:
@Service
public class MyService {
private final WebClient webClient;
...
public Mono<Details> someRestCall(String name) {
return this.webClient.get().url("{name}/details", name)
.retrieve().bodyToMono(ResponseEntity.class);
}
}
...
Mono<ResponseEntity> b1 = myService.someRestCall("serviceB");
Mono<ResponseEntity> c1 = myService.someRestCall("serviceC");
Mono<ResponseEntity> b2 = myService.someOtherRestCall("serviceB");
ResponseEntity response = Flux
.parallel(NUM_CPUS)
.merge(b1, c1, b2)
.limitRequest(MAX_REQUESTS)
.onErrorReturn(ERR_RESPONSE_ENTITY)
.blockLast(CUTOFF_TIMEOUT);
(based on this example)