1

We have an MVC app that every know and then (anything from few times a day to once per 2-3 days) suffers a a site-wide hang (it doesn't respond to any requests and doesn't restart on its own either). We managed to reduce the number of suspects to a single page (whenever the site would suffer a hang this page would show up at the top of hanging requests). Knowing which page is the culprit doesn't help though, because we're unable to recreate the problem on development machine. In fact we can't even recreate the problem on the production machine (most of the time, simply opening the faulty page doesn't break anything).

What we know is that, sometimes when a user goes to this page (controller action) the site will hang and we know there isn't much in the way the request is being made that could contribute the the hang (from IIS logs we know how user got to the faulty page and that we're dealing with a simple GET with no query string params).

We would like to know where exactly in the code the hang happens, but we don't know how to get such information out of the app/worker thread/IIS/Windows Server. No exception is saved to Windows log, nor is our built into the app logger picking any exceptions (probably has something to do with the hang).

What are the ways to tell what exactly is an IIS worker thread doing at a given time (like get the corresponding file and line of code)?


P.S. I described the exact symptoms of the site-wide hang in another question, but those are irrelevant to this question. Plus the question ended up being too broad which only brought generic answers. P.S.2 We found the offending page by looking into Worker Processes\Requests in IIS.

jahu
  • 5,427
  • 3
  • 37
  • 64
  • Attach the debugger, or add logging. – CodeCaster Jul 18 '16 at 15:04
  • Step 1 - get a complete memory dump. Can be done with task explorer or a debugging tool. Step 2 - use [Debug Diagnostic Tool](https://www.microsoft.com/en-us/download/details.aspx?id=49924) to analyze the memory dump OR use Windbg (if you are new to this try the 1st option first). It is relatively simple to get a complete listing of threads and also call stacks for each thread. You have to create the memory dump when you think the web app is no longer being responsive. – Igor Jul 18 '16 at 15:05
  • Side note - that `Debug Diagnostic Tool` app is very well made, download it and try it out. Its got a full GUI so no need to start learning commands. It does take a little effort to analyze the results but figuring something like this out is usually not easy work regardless of how you approach it. – Igor Jul 18 '16 at 15:07
  • @Igor We tried to make a complete memory dump when the site hangs using DebugDiag, but it fails for some reason. The best we can get is a mini dump from DebugDiag. I guess I could try creating a dump using the task manager (should be possible to make one for the offending thread rather than the whole app). I'll try that the next time the app suffers the hang. – jahu Jul 18 '16 at 15:09
  • Why does it fail? Maybe you are not running it with admin rights or maybe there is not enough disk space? Alternatively you can connect to the running app but this will freeze it for all other users that are connecting (not sure if that is a big deal or not). Alternatively try creating a dump from windows Task Manager using right click. – Igor Jul 18 '16 at 15:10
  • @Igor The server is a little short on HDD space, but a dump file the size of the memory the site was using at the time does get created. However DebugDiag informs that it has failed and any attempts to analyze the faulty dump result in a report saying that the dump is faulty. Additionally when trying to preform the full dump, DebugDiag freezes for a minute or 2 and during this time the app will restart itself (something it fails to do automatically otherwise). – jahu Jul 18 '16 at 15:19
  • If you want to know the exact state of your application when it hangs you will need a memory dump or debug it against the live process using WinDbg. Again, you can use the standard windows task explorer to create a dump. Alternatively install Windbg and debug it live. – Igor Jul 18 '16 at 15:22
  • @Igor Inspired by Joseph Rosson's answer, we wrote a simple logger and logged every major operation inside the controller action. It turned out we had a non-obvious problem in a view that would reveal itself if certain (impossible to recreate and detect) conditions were met. Looks like, if something goes terribly wrong inside a view, it can bring down a whole site, instead of just throwing an exception. – jahu Jul 21 '16 at 11:11

1 Answers1

3

In the past I've had the same issue and what really quickly helped me was the ability to inject a line of tracing code int the DLL(s) using Mono.Cecil. The code below injects tracing into the unsigned version of the DLL's and outputs a new DLL that can be used to create an "indented trace" into the code by injecting logging lines in the start (and end if needed...not shown below) of each method so that you can time the entry and exist of each call. There are a lot of third party tools but this is easy (a days work at best), gives full control to the developer, and as a bonus, is free.

You also need to create a DLL (IISStackTraceProvider below) that has the trace class and the static call "TraceStep" which can be used to log the data. To reconstruct that data across worker processes and threads, you associate a GUID and stopwatch with each BeginRequest/EndRequest using the "HttpContext.Items Property". Since your call hangs...you want to trace any call that "Begin" but never "End" or Ends with a time out and toss the rest away to keep things fast.

I've tested this against ~one million calls an hour/per server in a web farm in our production environment without affecting performance but be mindful of what requests you want to log and which requests get thrown away. Also, I used Redis to capture the logs since write times are insanely fast and also it's free, then just read the Redis data once I trap the issue.

class TraceInjection
{
    private ELogLevel logLevel;
    public enum ELogLevel
    {
        eLow,
        eMid,
        eHigh
    }

    public TraceInjection(ELogLevel logLevel)
    {
        this.logLevel = logLevel;
    }

    public bool InjectTracingLine(string assemblyPath, string outputDirectory)
    {
        CustomAttribute customAttr;
        AssemblyDefinition asmDef;

        // New assembly path
        string fileName = Path.GetFileName(assemblyPath);

        string newPath = outputDirectory + "\\" + fileName;

        // Check if Output directory already exists, if not, create one
        if (!Directory.Exists(outputDirectory))
        {
            Directory.CreateDirectory(outputDirectory);
        }

        ModuleDefinition modDefCopy = null;
        TypeDefinition typDefCopy = null;

        try
        {
            var resolver = new DefaultAssemblyResolver();
            resolver.AddSearchDirectory(System.IO.Path.GetDirectoryName(assemblyPath));

            var parameters = new ReaderParameters
            {
                AssemblyResolver = resolver,
            };

            // Load assembly
            asmDef = AssemblyDefinition.ReadAssembly(assemblyPath, parameters);

            String functionsFound = "";

            foreach (var modDef in asmDef.Modules)
            {
                modDefCopy = modDef;
                foreach (var typDef in modDef.Types)
                {
                    typDefCopy = typDef;                        
                    foreach (MethodDefinition metDef in typDef.Methods)
                    {
                        try
                        {
                            // Skipping things I personally don't want traced...
                            if (metDef.IsConstructor || 
                                metDef.IsAbstract ||
                                metDef.IsCompilerControlled ||
                                metDef.IsGetter ||
                                metDef.IsSetter
                                ) continue;

                            functionsFound += String.Format("{0}\r\n", metDef.Name.Trim());


                            // Get ILProcessor
                            ILProcessor ilProcessor = metDef.Body.GetILProcessor();

                            /*** Begin Method ******/
                            // Load fully qualified method name as string
                            Instruction i1 = ilProcessor.Create(
                                OpCodes.Ldstr,
                                String.Format(">,{0},{1}", metDef.Name.Replace(",", ""), asmDef.Name.Name)
                            );
                            ilProcessor.InsertBefore(metDef.Body.Instructions[0], i1);

                            // Call the method which would write tracing info
                            Instruction i2 = ilProcessor.Create(
                                OpCodes.Call,
                                metDef.Module.Import(
                                    typeof(IISStackTraceProvider).GetMethod("TraceStep", new[] { typeof(string) })
                                )
                            );
                            ilProcessor.InsertAfter(i1, i2);

                        }catch(Exception ex)
                        {
                            // ...
                        }
                    }
                }
            }

            Console.Write(functionsFound);
            Console.ReadKey();

            // Save modified assembly
            asmDef.Write(newPath, new WriterParameters() { WriteSymbols = true });
        }
        catch (Exception ex)
        {
            modDefCopy = null;
            typDefCopy = null;

            // Nothing to be done, just let the caller handle exception 
            // or do logging and so on
            throw;
        }

        return true;
    }

    public bool TryGetCustomAttribute(MethodDefinition type, string attributeType, out CustomAttribute result)
    {
        result = null;
        if (!type.HasCustomAttributes)
            return false;

        foreach (CustomAttribute attribute in type.CustomAttributes)
        {
            if (attribute.Constructor.DeclaringType.FullName != attributeType)
                continue;

            result = attribute;
            return true;
        }

        return false;
    }
}
Joseph Rosson
  • 334
  • 1
  • 8
  • This is a really good answer. We simplified it greatly, by just writing logs to a text file and skipping the part about detecting when a requests completes and just writing everything there (we didn't had that many requests anyway). Thanks to this we were able to pinpoint the problem. It was a faulty custom unobtrusive validator that was causing Razor to hang (an subsequently blocking further requests from completing). – jahu Jul 21 '16 at 11:04