3

I am trying to access another service by Http to fetch data using HttpClient. The uri should look like endpoint:80/.../itemId.

I am wondering if there is a way to make a batch call to specify a set of itemIds? I did find someone suggests to .setHeader(HttpHeaders.CONNECTION, "keep-alive") when creating request. Via doing this, how could I release the client after getting all data?

Also, seems like this method still need to get one response then send another request? Is this possible to do it as async, and how? BTW, seems like I could not use AsyncHttpClient in this case for some reason.

Since I am barely know nothing about HttpClient, the question may look dumb. Really hope someone could help me solve the problem.

Meng Li
  • 65
  • 1
  • 7
  • https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html – kichik Jun 26 '17 at 22:38
  • Please define _batching_ in the context of HTTP. Do you mean over a single TCP connection? The answer then is the keep-alive header. Or multiple IDs at once? Then the answer is _yes if the server supports this usage_ and has nothing to do with HTTP. Can you do it async? Yes, but that does not equate to batching in any way... Your question is simply too vague to answer. – kaqqao Jun 26 '17 at 23:15

1 Answers1

2

API support on the server

There is a small chance that the API supports requesting multiple IDs at a time (e.g. using a URL of the form http://endpoint:80/.../itemId1,itemId2,itemId3). Check the API documentation to see if this is available, because if so that would be the best solution.

Persistent Connections

It looks like Apache HttpClient uses persistent ("keep alive") connections by default (see the Connection Management tutorial linked in @kichik's comment). The logging facilities could help to verify that connections are reused for several requests.

To release the client, use the close() method. From 2.3.4. Connection manager shutdown:

When an HttpClient instance is no longer needed and is about to go out of scope it is important to shut down its connection manager to ensure that all connections kept alive by the manager get closed and system resources allocated by those connections are released.

CloseableHttpClient httpClient = <...>
httpClient.close();

Persistent connections eliminate the overhead of establishing new connections, but as you've noted the client will still wait for a response before sending the next request.

Multithreading and Connection Pooling

You can make your program multithreaded and use a PoolingHttpClientConnectionManager to control the number of connections made to the server. Here is an example based on 2.3.3. Pooling connection manager and 2.4. Multithreaded request execution:

import java.io.*;
import org.apache.http.*;
import org.apache.http.client.*;
import org.apache.http.client.methods.*;
import org.apache.http.client.protocol.*;
import org.apache.http.impl.client.*;
import org.apache.http.impl.conn.*;
import org.apache.http.protocol.*;

// ...
PoolingHttpClientConnectionManager cm =
        new PoolingHttpClientConnectionManager();
cm.setMaxTotal(200); // increase max total connection to 200
cm.setDefaultMaxPerRoute(20); // increase max connection per route to 20
CloseableHttpClient httpClient = HttpClients.custom()
        .setConnectionManager(cm)
        .build();

String[] urisToGet = { ... };
// start a thread for each URI
// (if there are many URIs, a thread pool would be better)
Thread[] threads = new Thread[urisToGet.length];
for (int i = 0; i < threads.length; i++) {
    HttpGet httpget = new HttpGet(urisToGet[i]);
    threads[i] = new Thread(new GetTask(httpClient, httpget));
    threads[i].start();
}
// wait for all the threads to finish
for (int i = 0; i < threads.length; i++) {
    threads[i].join();
}

class GetTask implements Runnable {
    private final CloseableHttpClient httpClient;
    private final HttpContext context;
    private final HttpGet httpget;

    public GetTask(CloseableHttpClient httpClient, HttpGet httpget) {
        this.httpClient = httpClient;
        this.context = HttpClientContext.create();
        this.httpget = httpget;
    }

    @Override
    public void run() {
        try {
            CloseableHttpResponse response = httpClient.execute(
                httpget, context);
            try {
                HttpEntity entity = response.getEntity();
            } finally {
                response.close();
            }
        } catch (ClientProtocolException ex) {
            // handle protocol errors
        } catch (IOException ex) {
            // handle I/O errors
        }
    }
}

Multithreading will help saturate the link (keep as much data flowing as possible) because while one thread is sending a request, other threads can be receiving responses and utilizing the downlink.

Pipelining

HTTP/1.1 supports pipelining, which sends multiple requests on a single connection without waiting for the responses. The Asynchronous I/O based on NIO tutorial has an example in section 3.10. Pipelined request execution:

HttpProcessor httpproc = <...>
HttpAsyncRequester requester = new HttpAsyncRequester(httpproc);
HttpHost target = new HttpHost("www.apache.org");
List<BasicAsyncRequestProducer> requestProducers = Arrays.asList(
    new BasicAsyncRequestProducer(target, new BasicHttpRequest("GET", "/index.html")),
    new BasicAsyncRequestProducer(target, new BasicHttpRequest("GET", "/foundation/index.html")),
    new BasicAsyncRequestProducer(target, new BasicHttpRequest("GET", "/foundation/how-it-works.html"))
);
List<BasicAsyncResponseConsumer> responseConsumers = Arrays.asList(
    new BasicAsyncResponseConsumer(),
    new BasicAsyncResponseConsumer(),
    new BasicAsyncResponseConsumer()
);
HttpCoreContext context = HttpCoreContext.create();
Future<List<HttpResponse>> future = requester.executePipelined(
    target, requestProducers, responseConsumers, pool, context, null);

There is a full version of this example in the HttpCore Examples ("Pipelined HTTP GET requests").

Older web servers may be unable to handle pipelined requests correctly.

tom
  • 21,844
  • 6
  • 43
  • 36
  • Thx for answer, I decide to choose Multithreading and Connection Pooling this time. Here is a fellow question. If we spin up as many threads as the number of requests. Is this possible to crash the daemon if we dont control the number of parallel threads. If so, could I use ExexutorService to setup a fixed pool? Also, for the intermediate result of each thread, could I store it inside concurrent map(thread safe), or I have to use Feature interface? – Meng Li Jul 11 '17 at 00:20
  • From [this question](https://stackoverflow.com/questions/763579/), if you have more than a thousand threads you risk running out of memory. An ExecutorService would be fine. A concurrent map would work (as long as you wait for all the tasks to finish using `Thread.join()` or `ExecutorService.awaitTermination()` before using the results). – tom Jul 11 '17 at 09:52
  • If you need more help, please ask a new question. I see that you already asked [a question](https://stackoverflow.com/questions/44939012/how-to-test-multi-thread-logic-in-java) a few days ago but didn't get good responses; be sure to use [minimal, complete examples](https://stackoverflow.com/help/mcve) in future, and if you still don't get good responses you can ping me by writing a comment on one of my answers (include a link to your question). Don't abuse this feature though :) – tom Jul 11 '17 at 10:16
  • thx for your kindly reply. Currently, to avoid too many threads in parallel crashing the daemon. I am using executorService to set aside a fixed pool of threads and using Future interface to collect response. I do have a follow up. How can I release a connection to allow other connection? Right now, I close the input stream to trigger the release, dont know if there are better ways? – Meng Li Jul 17 '17 at 21:37
  • Closing the input stream is the correct way to release the connection, see [Ensuring release of low level resources](https://hc.apache.org/httpcomponents-client-ga/tutorial/html/fundamentals.html#d5e145). – tom Jul 18 '17 at 03:41
  • Like I mentioned before, I need to setup a fixed number of threads to avoid too many threads crashing the daemon. So how to decide the max number of the threads can I host hold? – Meng Li Jul 26 '17 at 23:59
  • @MengLi: Please ask a new question with more details (which daemon is crashing, any error messages or stack traces you get, how many threads you are currently using, and a minimal working example showing how you manage the threads and replicating the crash if possible). I won't respond to more comments here. – tom Jul 27 '17 at 22:37