2

I have a servlet.Filter implementation that does a lookup of a client's user ID in a database table (based on the IP address), it attaches this data to an HttpSession attribute. The filter does this whenever it receives a request from a client without a defined HttpSession.

In other words, if there is no session attached to a request, the filter will:

  • create a session for the client
  • do a database lookup for the user ID
  • attach the user ID as a session attribute

This all works fine if there is some time in between requests from a "session-less" client.

But if a "session-less" client sends 10 requests within milliseconds of each other I end up with 10 sessions and 10 database queries. It still "works" but I don't like all of these sessions and queries for resource reasons.

I think this is because the requests are so close together. When a "session-less" client sends a request and gets a response before another request is sent I don't have this problem.

The relevant parts of my filter are:

// some other imports

import org.apache.commons.dbutils.QueryRunner;
import org.apache.commons.dbutils.handlers.MapHandler;

public class QueryFilter implements Filter {

    private QueryRunner myQueryRunner;  
    private String myStoredProcedure;
    private String myPermissionQuery;
    private MapHandler myMapHandler;

    @Override
    public void init(final FilterConfig filterConfig) throws ServletException {
        Config config = Config.getInstance(filterConfig.getServletContext());
        myQueryRunner = config.getQueryRunner();
        myStoredProcedure = config.getStoredProcedure();
        myUserQuery = filterConfig.getInitParameter("user.query");
        myMapHandler = new MapHandler();
    }

    @Override
    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) 
            throws ServletException {

        HttpServletRequest myHttpRequest = (HttpServletRequest) request;
        HttpServletResponse myHttpResponse = (HttpServletResponse) response;
        HttpSession myHttpSession = myHttpRequest.getSession(false);
        String remoteAddress = request.getRemoteAddr();

        // if there is not already a session
        if (null == myHttpSession) {

            // create a session
            myHttpSession = myHttpRequest.getSession();

            // build a query parameter object to request the user data
            Object[] queryParams = new Object[] { 
                myUserQuery, 
                remoteAddress
            };

            // query the database for user data
            try {
                Map<String, Object> userData = myQueryRunner.query(myStoredProcedure, myMapHandler, queryParams);

                // attach the user data to session attributes
                for (Entry<String, Object> userDatum : userData.entrySet()) {
                    myHttpSession.setAttribute(userDatum.getKey(), userDatum.getValue());
                }

            } catch (SQLException e) {
                throw new ServletException(e);
            }

            // see below for the results of this logging
            System.out.println(myHttpSession.getCreationTime());
        }

        // ... some other filtering actions based on session
    }
}

Here are the results of logging myHttpSession.getCreationTime() (timestamps) from ONE client:

1343944955586
1343944955602
1343944955617
1343944955633
1343944955664
1343944955680
1343944955804
1343944955836
1343944955867
1343944955898
1343944955945
1343944955945
1343944956007
1343944956054

As you can see, almost all the sessions are different. These timestamps also give a good idea of how close the requests are spaced together (20ms - 50ms).

I can't redesign all client-side applications to ensure that they get at least one response before they send another request intially, so I want to do that in my filter.

Also, I don't want to just make the subsequent requests fail, I would like to figure out a way to handle them.

Question

  • Is there a way to put subsequent requests from the same client (IP address) into "limbo" until a session has been established from the first request?

  • And, if I manage that, how can I get the correct HttpSession (the one that I attached the user data to) when I call aSubsequentRequest.getSession() afterwards? I don't think I can assign a session to a request but I could be wrong.

Maybe there is some better way to go about this entirely. I basically would just like to stop this filter from running the lookup query 10 - 20 times unnecessarily within a 2 second time period.

egerardus
  • 11,316
  • 12
  • 80
  • 123
  • In this case may be applicationContext helps in keeping track of existing requests? If a request came from same IP, log it applicationContext, before making new request make sure it is not in application context? – kosa Aug 01 '12 at 00:26
  • @thinksteep that makes sense, but is there a way of applying the stored session to a different request with the same IP address? – egerardus Aug 01 '12 at 00:35
  • This is a rather strange requirement. Based on your comment in one of the below answers, I understood that the concrete problem for which you thought that *this* would be the right solution boils down to be caused by firing multiple ajax requests from a single page which are not been fired in a queue. The solution to that is actually pretty straightforward: just fire them in a queue in JavaScript side. Lot if not all of the existing ajax based MVC frameworks (like JSF) already do exactly that deep under the covers. – BalusC Aug 04 '12 at 15:54
  • @BalusC `just fire them in a queue in JavaScript side` does that mean, "wait for the response in JS before sending another request"? Wouldn't I lose the "A" in "ajax"? If I do it on the java side I can only make it synchronous when I need to (i.e. when there is no session established). – egerardus Aug 06 '12 at 06:20
  • How can you differentiate one client from another before you have a session id? – Christopher Schultz Aug 07 '12 at 01:08
  • @ChristopherSchultz it's all intranet so using IP address – egerardus Aug 07 '12 at 03:26
  • 2
    That's not going to work for you, unfortunately. Proxy servers, NAT, etc. are all going to conspire against you such that some clients (you'll never be able to guess which ones) will be indistinguishable from each other and you'll end up mixing them up and ultimately (probably) cross-pollinating their credentials (which is obviously bad). Are these automated clients using some kind of HTTP-based API? If so, I think you need to mandate that clients first obtain a session and then start bombarding your service additional requests. Otherwise, the clients risk obtaining multiple sessions. – Christopher Schultz Aug 07 '12 at 13:57
  • @ChristopherSchultz I noticed that now too, I changed this to use a modified [**waffle**](http://dblock.github.com/waffle/) filter now to obtain user credentials if the user is on the domain. If not it will pop up the authorization dialog. To handle the initial problem of concurrent requests, I tried fooling around with thread locking but I can't seem to get it right, based on your and BalusC comments it seems the only thing to do is to make each web-app establish a session first before doing more requests. Doing that now. If you want to post an answer to that effect I can accept it. – egerardus Aug 07 '12 at 15:32

6 Answers6

1

I would cache the database lookup and find some way to invalidate the cache when the database changes or use a timeout in the cache. For instance Google's Gauva has a cache that will invalidate an entry after specified amount of time. Here's some basic code.. Setting the attribute on the session with the same value should be fine. One could also use a HttpSessionListener to invalidate the particular cache entry that contains the 'userID' when the session is destroyed.

static LoadingCache<String, String> ipAddressToUserLookupCache = CacheBuilder.newBuilder()
        .maximumSize(10000)
        .expireAfterWrite(10, TimeUnit.MINUTES)
        .build(
            new CacheLoader<String, String>() {
              public String load(String ipAddress) throws Exception {
                // find the user ID
                return "<user id>";
              }
            });

@Override
public void doFilter(ServletRequest req, ServletResponse resp, FilterChain fc) throws IOException,
        ServletException {
    final String ipAddress = req.getRemoteAddr();
    final String userName = ipAddressToUserLookupCache.get(ipAddress);
    ((HttpServletRequest)req).getSession(true).setAttribute("username", userName);
}
  • The problem is that the first request has not finished looking up the user ID before another request comes in from this same IP address. I am not able to put anything into a cache before I need to find it again. Unless I misunderstood your answer. I'll try to make the question clearer. – egerardus Aug 02 '12 at 22:19
  • Requests from the same IP address will block (ie not go to the database) until the user id is retrieved. – mindas Aug 04 '12 at 16:13
1

You are dealing with the Thundering Herd Problem. The best way to solve it is to use a caching implementation that will deal with this problem. Here one way to solve it.

  1. In the filter use a Google Guava loading cache and lookup the information you wan using the SessionId. Google guava is designed so that if a key is not in the cache and a n threads hit the cache looking for an object at the same time then only one thread will call the load method and the others will block while the item is brought into the cache. Do not set an upper limit on this guava cache since the size of the cache will be the same as the number of http sessions, given your desire to store items in the session. If the issue is that multiple httpSessions are being created by the container for requests that arrive at the same time then cache based on something in the request that does not change such such a user id or some of the fields from queryParams in your example code.

  2. Write a HttpSessionListener this will get called automatically by the servlet container when sessions expire or are invalidated in the HtttpSessionListener you can then call the invalidate method on the Google guava cache that way you end up with items getting added to cache on the first request and getting kicked out of the cache on session expiry.

  3. You can also implement HttpSessionActivationListener which will notify you when sessions are passivated to disk by the web container, this can happen for a variety of reasons such as low memory, or client has not sent a request in a while but the session is not yet expired so it get passivated. On passivation events it would make sense to evict your item out of the cache and on activation events put it back in the cache.

  4. You must make sure that the items you put in the Cache are thread safe, I would recommend immutable objects, built with safe object construction techniques.

I am using the above techniques in my application which is based on Spring so I am doing the above with some slight modifications.

  1. I am using the Spring Application Context events to fire a event when something that can invalidate a cache happens that way the caches can just listen for events on the spring application context and invalidate their state. Session activation / passivisation and creation / destruction fire events and then then multiple caches can react.

  2. I don't make use of a filter and use natural keys, for example a use profile cache is keyed on the user id and it is not populated until someone asks for the user profile for user id 12304.

  3. I am religious about thread safety and making sure to use immutable objects in all the caches. This means that you have to have Immutable data structures such as lists, maps ... etc this is another area where Google Guava is just amazing, you get a lot of useful data structures. Immutable list, map, set, multimap ... etc.

If you nee code samples let me know.

Another possibility is that you can use synchronization in your filter that will kill performance but will make things serial.

ams
  • 60,316
  • 68
  • 200
  • 288
1

I think what you need to do is require that your clients authenticate (successfully) first, then make additional requests. Otherwise, they run the risk of generating multiple sessions (and having to maintain them separately). That's really not so bad of a requirement IMO.

If you are able to rely on NTLM credentials, then you could perhaps set up a map of user->token where you place a token into the map upon first connect and then all requests block (or fail) until one of them successfully completes the authentication step at which point the token is removed (or updated so you can use the preferred session id).

Christopher Schultz
  • 20,221
  • 9
  • 60
  • 77
0

By doing a check first (to see if the request has a session) you have a race condition.

You should instead use:

request.getSession()

If you check the javadoc for HttpServletRequest, you'll see:

Returns the current session associated with this request, or if the request does not have a session, creates one.

If you use that method both calls should return the same session, then you can check for the existence of the userID attribute before trying to set it.

Fahim Parkar
  • 30,974
  • 45
  • 160
  • 276
MattR
  • 6,908
  • 2
  • 21
  • 30
  • `If you use that method both calls should return the same session` that's what I thought also but it is giving me different sessions. I think because these requests are so close to each other from the same client, the filter is receiving a second request from the client before it creates the first session. (I think) – egerardus Aug 01 '12 at 00:39
  • Yes, thinking about it some more I realised that's not enough - the race condition still exists... Might be better to design your userId solution so that it doesn't matter. – MattR Aug 01 '12 at 02:30
0
  1. Just want to ask, how can you really have a scenario like this in real time world where-in multiple request(more than 2-3) are sent from same IP or same Client with just 20 ms difference? The app I work on, when I try to click the submit button again, it won't submit the page again and behaves in an intelligent manner.

  2. Basically, we usually make sure that the application is Double Submit proof. Please refer this link for more info. Solving the Double Submission Problem

I think, if you can try to avoid a scenario like double submit or multiple submits from same client, your problem wont arise.

Community
  • 1
  • 1
Metalhead
  • 1,429
  • 3
  • 15
  • 34
  • The scenario for this filter (what it is used for), is to attach an identifying user ID to Ajax requests coming from single-page intranet webapps. It's never used for handling submitted forms, so it is not just a double click problem. The particular webapp I show logged (with 14 sessions created for the same client) is because it initializes with 14 different Ajax requests right away. – egerardus Aug 04 '12 at 04:09
  • 1
    Thanks for clarifying, right now I could only think of an option to have a lock/synchronized based approach for e.g using concurrentHashMap for storing the IPaddresses and then performing the lookup using the putifAbsent method for threadsafety. – Metalhead Aug 04 '12 at 16:45
0

Truly the easiest solution is to use one of the several caching frameworks that offer a self populating strategy.

Basically what this means is that when you go to the cache for a specific key, if the key does not exist, you've provided a function to create the data for that key.

While that function is executing, any other access to that same key blocks.

So if you try to hit the cache for a specific IP, the cache sees there is no entry for it. Then it calls your routine to load from the database. While that loads, everyone else trying for there same IP simply waits until the routine is done, then they all return the same value.

ehcache is a framework that supports this, there are certainly others.

You want to use a framework for this because they've gone through all of the pain of managing the locks and contention etc. for you.

Will Hartung
  • 115,893
  • 19
  • 128
  • 203