1

I have a java serlvet, which accesses a hadoop cluster and sends a downloadable csv of some data from the hadoop cluster as a response.

My issue is that there appears to be multiple GET requests to this servlet (which I understand to be intentional for chrome + other browsers), which is causing multiple connections to open to my Hadoop cluster. I don't want multiple requests at once. Is there a way to deny multiple requests from the same source and only respond to the first request?

Hans Landa
  • 105
  • 2
  • 9
  • 1
    You can track on a per-session basis whether there is an ongoing computation, and modify your response accordingly. The problem is that if the client issues a second request, then it may no longer be able to receive a reply to the first at all. If it *can* do, then it will *expect* to do, so you should not try to ignore duplicate requests. If you're not going to process a request normally then return an appropriate HTTP status code, maybe a 409 ("Conflict") in this case. – John Bollinger Mar 20 '17 at 20:40
  • What about having a servlet filter which blocks a request which already have established a session. – Yohannes Gebremariam Mar 20 '17 at 20:46
  • It would be worthwhile to characterize the unwanted behavior better. Among other things, does it pertain *only* to GET requests? In that case, you could consider making clients POST instead. – John Bollinger Mar 20 '17 at 21:10
  • 2
    It is not "normal" for a web browser to make multiple requests without reason. You have something wrong with your code, web server configuration, or the user is actually triggering the request twice. – Brad Mar 20 '17 at 21:11
  • @Brad http://stackoverflow.com/questions/4460661/what-to-do-with-chrome-sending-extra-requests According to this, it is normal and known for certain web browsers to send multiple GET requests. This is why it's bad to have GET change application state. It does seem strange, though. What could be wrong with my server config such that multiple requests are being sent by the client? I'm the user btw, so unless it's happening behind the scenes, and as far as I know i'm not sending the same request twice. – Hans Landa Mar 20 '17 at 21:15
  • What i've done to remedy this so far is declared a static boolean at the top of my code to hold whether or not there's a user currently extracting data.. However this is just temporary and i'm wondering if there's a better solution – Hans Landa Mar 20 '17 at 21:19
  • It's also not the code. The code works fine when running in Eclipse. However, when I move the WAR file to an Apache Tomcat environment it just loops as if the request is being sent multiple times (I have print statements) – Hans Landa Mar 20 '17 at 21:30
  • You could try setting the cache headers, although this may still require at least one successful response from your app in order to provide cached responses for subsequent requests – Brad Mar 20 '17 at 21:56

2 Answers2

1

It's unclear to me what the basis is for your claim that the behavior is intentional. The other SO question you cited in comments merely makes the same claim, without citation to any source for it. In the end, however, it probably doesn't matter: if the behavior is common, as opposed to being associated with a small number of specific buggy instances that you can just fix, then you probably need to deal with it regardless.

With that said, GET requests are not, in principle, supposed to change the state of the server (and it follows that they should be idempotent). This could be taken as justification for all kinds of interesting -- and annoying -- behavior. Inasmuch as there can be no justification for similarly making duplicate or preemptive POST requests, however, I anticipate that you could solve the problem by disabling the GET method for the resource in question, and forcing clients to instead POST requests for it. I do not think clients will then issue duplicate requests except as explicitly directed by users (e.g. via double-clicking the link / button).

On the other hand, supposing that your web application is performing session tracking -- enabled by default in most servlet containers -- you can detect multiple concurrent requests and handle them. Specifically, you can set a session attribute when you begin handling such a request, clear it when you finish, and have the servlet test that attribute to determine how to handle each request.

I suggested in comments that you might return an error code for duplicate requests, and indeed you can, but that behavior might surprise clients because they may expect GET requests to be idempotent. As an alternative, you could consider deferring service on the duplicate requests until the computation is finished, and then serving identical responses to all the requests based on the same computation results.

As far as I know, however, you cannot simply drop duplicate requests. There is simply no mechanism to do so anywhere in the Servlet API.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • There is no need to drop a request, if you can send back a cached response. As you say a GET is idempotent, it's just a question of setting a caching policy that is acceptable to users – Brad Mar 20 '17 at 21:57
  • @Brad, the OP hypothesized denying the duplicate requests, though I suppose he didn't actually say "drop". But anyway, caching is not the issue, as you can cache only responses that you've *already* emitted. The OP wants to avoid devoting resources to computing responses to equivalent requests *concurrently* ("I don't want multiple requests ***at once***"), so in his case there's nothing (yet) to cache. – John Bollinger Mar 20 '17 at 22:12
0

Solution: Nginx Proxy Cache

You can put Nginx in-front of the java servlet. Nginx has a caching module and required mechanism to prevent thundering herd.

Although your problem does not involve high concurrency but still you can use nginx cache locking mechanism which will lock the cache for a defined period of time allowing only one request to fill the cache.

Documentation: https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache

proxy_cache_lock on;
proxy_cache_lock_age 5s; # depending on the use case
proxy_cache_lock_timeout 5s;
Holy_diver
  • 377
  • 1
  • 15