6

Environment:
Java-EE based web application


Problem:
Need to restrict user to make more than 5(for example) request within same Second (BOTs mainly)


Solution :
As a basic design I am planning to have 2 synchronized Map in application scope

Map<String, Map<Long, Integer>>

String is for sessionId of request

Long is for current second representation

Integer is to hold request count


Process:

Step 0:

Configuring a Filter to intercept each request

Step 1:

determine the map I will see if current minute is odd then I will add data on mapOne and I will clear the mapTwo

Step 2:

process map

int requestNoForThisSecond = mapXX.get(request.getSession().getId()).get(currentSecondRepresantationInLong);
if(requestNoForThisSecond <= 5){
          requestNoForThisSecond++; 
          mapXX.get(request.getSession().getId()).put(currentSecondRepresantationInLong, requestNoForThisSecond);
}else{
         response.sendRedirect();// redirect to some captcha page
    } 

Step 4:

also remove the session entry if session expires / user logs out


This is very basic design for the problem

Any one of you have any better idea/suggestion ?

jmj
  • 237,923
  • 42
  • 401
  • 438
  • I do not think that this question has anything at all to do with Java or Java-ee. No? I mean, this is a discussion about an approach, and that's language-agnostic. If you want us to actually check your code, then the appropriate site for that would be CodeReview.StackExchange.com – Mike Nakis Jan 04 '12 at 12:38
  • 1
    See http://stackoverflow.com/questions/667508/whats-a-good-rate-limiting-algorithm – skaffman Jan 04 '12 at 12:44
  • limited request per Second for a session id or limited request per Second for a user , where he can open many sessions in different browser. – Dead Programmer Jan 04 '12 at 13:01
  • @jigar ok , how about using geolite ip address database and java api.http://blog.anthonychaves.net/2006/07/14/maxminds-geolitecity-database-and-java-api/ – Dead Programmer Jan 04 '12 at 13:06
  • @jigar: kindly look at this link http://stackoverflow.com/questions/4699352/solving-the-double-submission-problem/4699622#4699622 – Dead Programmer Jan 04 '12 at 13:28
  • @Suresh your answer is good with the context of that question, We can't assume that BOT will submit message body while requesting. Thanks! – jmj Jan 04 '12 at 13:32
  • @jigar: ah!! i understood the problem. i was mislead with Java-EE based web application – Dead Programmer Jan 04 '12 at 13:33
  • 1
    If it's for robots only, have you considered `Crawl-delay` directive in `robots.txt`? With that you can specify the amount of seconds a bot has to wait between successive requests. For 5 times per minute, that would be `Crawl-Delay: 12`. True, not all robots adhere it (homegrown/leechers, etc), but the self-respected robots like Googlebot do. – BalusC Jan 04 '12 at 14:50

5 Answers5

3

First of all, I think you should forget the idea of the session id, and use IP addresses instead. You do not expect the bot to be sending you the necessary cookies so that you can keep track of its session, do you?

Secondly, I think that your approach is unnecessarily complicated. All you need is a map of IP-address to array-of-time[N] where N is a fixed number, the number of requests you are planning to allow per second. (I am assuming it will be relatively low.) So, every time you have a request from a given IP, you shift the contents of the array down by one, and you add the time of the new request to the end of the array. Then, you subtract the time at index 0 of your array from the time at the last index, and this gives you the amount of time it took that IP to send you N requests, which you can trivially convert to number of requests per second.

Also, you might find this discussion interesting: https://softwareengineering.stackexchange.com/questions/126700/development-of-a-bot-web-crawler-detection-system

Community
  • 1
  • 1
Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • 2
    IP Address might not work, How will he capture calls hiding behind a firewall or in a VPN? – Buhake Sindi Jan 04 '12 at 12:36
  • s,might,will, -- the session ID _is_ what must be used here – fge Jan 04 '12 at 12:42
  • This will only pose a problem if we are talking about a **huge** network behind that firewall or VPN, and in that case, the people connecting from it will just have to live with some throttling. I do not see a problem with that. – Mike Nakis Jan 04 '12 at 12:54
  • @fge so, you believe that the bot will be kind enough to send the session cookie? – Mike Nakis Jan 04 '12 at 12:57
1

Probably a very bad hack but...

Implement a custom Set<Long> for which the .add() operation returns false if you try and push the same long value more than the threshold and use that as values?

The code would then look like:

if (!theMap.get(whatever()).add(secondInLong))
    // threshold reached

One advantage is that it would forbid a race condition in your current code: if only your map is synchronized, the check for the number of sessions is not protected. With this solution, it is.

Or surround the code with a lock of some sort, and use a "normal" map.

Taking this idea further, you could even implement a custom Map with delegation. The "long in second" representation would then be calculated within the map itself and you wouldn't need care of it.

fge
  • 119,121
  • 33
  • 254
  • 329
1

5 requests a second is equivalent to 1 request every 0.2 seconds. So why not simply have a Map which stores sessionID and last System.nanoTime() of the user, and your filter then only has to do a quick evaluation to check that at least 200ms have elapsed since the user's last request.

mcfinnigan
  • 11,442
  • 35
  • 28
  • well may be it can be the case where first request comes at 1000ms second at 1001 ms and then user behaves normally. your idea is good but this is not completely fulfilling the stated requirement. Thanks! – jmj Jan 04 '12 at 12:46
1

There's a Synchronizer Token Pattern. This pattern was suggested to prevent double-submission, Cross-Site Request Forgery, etc. Struts uses this pattern extensively (example mentioned on JavaRanch).


For those who don't know how Synchronizer Token Pattern works, here goes:

  1. User requests a page. On the server, the controller responsible for the page request retrieves a token (not JSESSIONID) from the page.
  2. If the token returned from the request matches the token found on the session, it's a valid token, continue.
  3. Reset the token (generate new token) and save this in a session. Thus, you can do validation and return to the same page with new token everytime.

In your suggestion, you will have to time your submission, make a count of session token retrieved (using HttpSessionListener) and limit your request call.

I hope this helps.

Buhake Sindi
  • 87,898
  • 29
  • 167
  • 228
0

Sounds reasonable, and similar to what was suggested in this article for Spring<->captcha integration.

milan
  • 11,872
  • 3
  • 42
  • 49