7

I have a production issue with In-Proc session state.

Our application is base on MVC 3 .NET framework and is integrated into our site running Sitecore CMS.

Our users have been experiencing "Object reference not set to an instance of an object" randomly through out the application flow.

After extensive logging and tracing we could conclude this was caused when the session object returns null.

Here's to some details about what we found and what we know.

  1. Session ID is being persistent for the same user and passed all the way into the application correctly.
  2. I don't believe this is a code issue, because this only happen on production at random interval, never happen on local, dev, or staging environment.
  3. There's two production server running through a load balancer.
  4. Is not a server persistent issue, as we tested by sleeping one of the server and having all traffic route to one server. Also through logging we could identify that user are hitting the same server, but the session have became null.
  5. This doesn't seem to be a client issue as well, because they are able to go through the application successfully even if they have encountered an error before.
  6. This doesn't seem to be a traffic load or server load issue, because it happen through out the day at random times, and happens to random users during.
  7. This doesn't seem to be caused by recycling the app pool.
  8. This doesn't seem to be caused by session timeout as we have set the timeout to be two hour and while we track the log, users could experience this 5-10min into the flow.

Side note: We must use In-Proc session state due to our Sitecore CMS. So changing the design is not an option.

I have a theory it might have something to do with session locking or being corrupted from concurrent access attempts.

A few place we see the occurrence of this problem a lot from our application is when the users is being redirected by a javascript (windows.location).

And in areas where async ajax calls are being made.

We been scratching our heads on this for a while, I'm wondering if anyone out there would have any insight or theory to what might the problem be?

Thanks

Added Note:

@Mystere && @H27Studio, So I've also discovered something relating to sessionID or session reset issues. In some case we discover that on a page redirect it is triggering two duplicate GETS calls to the method, with the first call missing a sessionID and randomly get redirected to one of the server (This is because the server persistent session from the load balancer is base on client IP, sessionID and other header information to create unique session to keep a client on one server). This happen every time during the flow when our redirect page is using a window.location.

This will cause the "Object reference not set.." issue for the client if the bad, no sessionID call hit the same server. (This probably because the first bad call with no sessionID is causing the application to create a new session which overrides the original session's object) So even on the second call where the correct sessionID is pass into the application we will discover that session object contain null.

So I believe there is an issue with the duplicate call that's clearing out the session object, which not sure why or what is causing that to begin with.

Anyone have clue regarding this? Thanks

Update: We are planning to take these steps in hope to resolve this issue.

  1. We have issues in areas where Async Ajax calls were made, so we are planning to remove the Async feature and let it the Ajax run in sync.
  2. We have issues where a Windows.location javascript redirect is happening. We have created an alternative method using postback in hope of fixing the issue in this area.
  3. Other areas, which aren't related to one of the above issue are still up in the air.

Effect of change will be posted once we deploy it to production.

Thanks for all the comments.

Jun Zheng
  • 677
  • 1
  • 15
  • 31
  • Dont trust session timeout. If the server needs more memory it will free the sessions. I have 1 hour set at my job, and still most people loose sessions before 20 mins, and sometimes in 5-10. (And its a machine with 69Gbs of RAM, and not a lot of traffic...) – H27studio Jan 25 '12 at 21:59
  • My company has been scratching our heads over this as well, even after `elmah` logging, etc using `inproc`... – Luke Hutton Jan 25 '12 at 22:00
  • 2
    @H27studio, do you have reference for "If the server needs more memory it will free the sessions"? – Luke Hutton Jan 25 '12 at 22:02
  • Bte, Are you really sure the load balancer is not sending users from one server to another on high load times? It really sounds like that. – H27studio Jan 25 '12 at 22:04
  • How do you use inproc session state with two servers? – Jan Jan 25 '12 at 22:07
  • @Davi, apparently not given this in the OP's question "So changing the design is not an option." I'm assuming the load balancer support sticky sessions for the OP, can confirm? – Luke Hutton Jan 25 '12 at 22:07
  • @H27studio The issue sounds awfully like load balancer issue at first, but we put in place performance logged at the application level, and it shows that all the calls are made to the same server. The load balancer log also confirms that. – Jun Zheng Jan 25 '12 at 22:08
  • @Jan The load balancer can keep our users directed to the same server with load balancer session, therefore we could use In-Proc session. – Jun Zheng Jan 25 '12 at 22:10
  • @Davi, we thought about the Out-Of-Process session state options like SQL or session state service. But Sitecore CMS doesn't support these options. Therefore we must stick with In-Proc session state. Also, we have a very complex model which make it very difficult to serialize. For Out-Of-Process session state. – Jun Zheng Jan 25 '12 at 22:13
  • @LukeHutton Im looking for an article. The problem is that i dont remember the name of this... :-) But IIS does it to save memory when on demand. It kills sessions because it allways tries to have x amount of free RAM. – H27studio Jan 25 '12 at 22:14
  • Have you logged Session_Start and Session_End events? Even if the session ends, the user would get a new session on the next request. – Jan Jan 25 '12 at 22:15
  • @H27studio, As for the session being cleared before timeout, wouldn't that kill the sessionID as well? The problem we have is that a user keeps it's sessionID persistent but randomly through out the flow the session object just appear to be null. – Jun Zheng Jan 25 '12 at 22:18
  • @Sti88 Im not sure. But i think it should delete everything and then create a new SessionID on next requests... – H27studio Jan 25 '12 at 22:59
  • @STi88 - SessionID can be reused by asp.net, and is the source of the Session Fixation attack vulnerability. That means that you can't rely on the session id changing between two sessions of the same browser, even if the session was abandoned. – Erik Funkenbusch Jan 25 '12 at 23:47
  • 1
    @Mystere Man and @@H27studio, I've added additional note regarding your comment. Thanks for the comments – Jun Zheng Jan 26 '12 at 00:05
  • Then is not what i was thinking. It seems like a JS error when redirecting with window.location. Could it be the usual suspect (IE6) doing bad requests? (or any other strange browser?) By te whay, @LukeHutton i found the name of what i was talking, is called scavenging. It happens to free cache elements, but also can delete sessions, its not a IIS thing, is the OS. – H27studio Jan 26 '12 at 10:04

2 Answers2

7

After months of searching and Debugging, I think we finally came to a conclusion. There seems to be a bug with Sitecore Analytics Robots session timeout. We first notice that whenever the random session lost was due to session pre-maturely timing out, then we notice that these session were getting set to 1min timeout instead of 120min.

After searching through all the config files we notice that Sitecore Analytic.Robots.SessionTimeout was the only timeout value set to 1min.

By increasing this value, it solved our session timeout problem.

So the fundamental problem is Sitecore Analytics is mis-identifying some visitor session as robot session and reassigning their timeout to 1min. This is probably a bug to report.

Update: Response from Sitecore:

Sitecore CMS was designed to be used with ASP.NET WebForms technology. While using web forms, the bot detection relies on the control in the of the page. It's natural that you can't use it in the ASP.NET MVC application, but there is an easy solution - put the following code inside the element:

<%
if (Context.Diagnostics.Tracing || Context.Diagnostics.Profiling)
{
  Response.Write("<!-- Visitor identification is disabled because debugging is active. -->");
}
else if (Tracker.IsActive && (Tracker.Visitor.VisitorClassification == 925))
{
  Response.Write("<link href=\"/layouts/System/VisitorIdentification.aspx\"    rel=\"stylesheet\" type=\"text/css\" />");
}
%>
Jun Zheng
  • 677
  • 1
  • 15
  • 31
0

I think your issue may be the Async ajax calls which you alude to. I read an article by David Hayden recently that talked about issues with concurrent ajax requests in the same session causing problems. It is something to look at anyways. Hopefully it helps.

http://davidhayden.com/blog/dave/archive/2011/02/09/SessionLessControllersMvc3.aspx

He talks about it right at the end of the post.

Perry
  • 611
  • 6
  • 15
  • Ajax requests are not causing problems, they will just get executed one after another on the server to prevent parallel access, when session state is enabled. Thats a performance issue and unrelated to the OPs question with disappearing sessions. – Jan Jan 25 '12 at 22:19
  • I've read this article and thought it might be an issue as well, but as Jan stated. Session states should get lock if is in a read/write state. Therefore, concurrent ajax request will just have to execute in order. Shouldn't be corrupting the session. Even if this would be the cause, I would think it would happen on a constant base instead of random interval. But this is one area I'm less confident in, maybe someone have a good method for me to verify if this is the issue? Thanks – Jun Zheng Jan 25 '12 at 22:24
  • Performance issue yes but he also mentioned that session may become corrupt which is why I brought it up. Just trying to help. – Perry Jan 25 '12 at 22:34
  • Yes, but ASP.NET protects session state from being corrupted by serializing the async calls when session state is enabled on the controller. – Jan Jan 25 '12 at 22:35
  • I understand that now. Should have absorbed the article a little more before posting. Thanks. – Perry Jan 26 '12 at 21:17