34

The problem

On our ASP .net website I keep getting wrong line numbers in stack traces of exceptions. I am talking about our live environment. There seems to be a pattern: The stack trace will always point to the line that contains the method's closing curly brackets.

For Example:

public class Foo
{
    public void Bar()
    {
        object someObject = null;
        someObject.ToString();
     /*
          arbitrarily more lines of code
     */
     } // this line will be the one that the stack trace points to
}

More details

  • To be clear: This does not only happen for some method(s), it happens for every exception that we log. So I would rule out (JIT) optimizations here, that might lead to line numbers being seemingly randomly off. What bothers me is, that the wrong line numbers seem to consistently point to the closing curly brackets of the containing method.

  • Note that before .net 4.6 there actually was a bug in the framework, such that if you compiled targeting x64, exactly this would happen. However, Microsoft confirmed that this has been fixed. Also running a minimal example app for this, reaffirms their claim. But for some reason it still happens on the web servers.

  • It puzzles me even more that it does not happen in our test and development environments. The test and live systems are pretty similarly set up. The only real difference is that live we are running Windows Server 2012, while on the test system we are still using Windows Server 2008.

What I have checked

  • pdb files are valid (tested with chkmatch)
  • Compile time code optimizations are not the issue
  • The aforementioned bug in .net before 4.6 cannot be reproduced with a minimal example
  • .NET versions are exactly the same
  • Build comes from the same deploy script

Any clues towards solving this problem are highly appreciated. If you know anything that we could check, please let me know.

timcbaoth
  • 669
  • 1
  • 7
  • 12
  • 8
    Is there the possibility that the live build is different than your development environments so that when comparing code you are getting incorrect line numbers. – CathalMF May 06 '16 at 12:17
  • Also is it possible that your code is executing on the live environment under a different .NET framework? – CathalMF May 06 '16 at 12:19
  • It doesn't sound quite like this problem that I had a while ago, but it is possible that it may be something similar that is causing it so might be worth looking into - http://stackoverflow.com/questions/34545285/mvc-razor-view-debugging-problems-mapping-incorrect-files-pdb-generation-bug – wizzardmr42 May 06 '16 at 12:19
  • https://dotnetfiddle.net/68ln5l doesn't seem to be the case here – Kritner May 06 '16 at 12:20
  • What about debug vs release builds? Could you have lines of code omitted due to `Debug` statements/regions? I don't really know how this would impact line numbers, or if it would. – Kritner May 06 '16 at 12:26
  • @CathalMF The build is exactly the same. We use the same deploy script for that. I also suspected that the .NET versions could be different but they are not. – timcbaoth May 06 '16 at 13:05
  • @timcbaoth Is there anyway you can attach visual studio to your live environment. – CathalMF May 06 '16 at 13:07
  • @CathalMF Unfortunately not. :( – timcbaoth May 06 '16 at 13:10
  • @Kritner No, there are no regions or debug. – timcbaoth May 06 '16 at 13:13
  • 4
    Going the other way around, can you clone your live environment into a dev environment for further testing? If you're not using a virtual server of some sort, there is a Sysinternals tool called Disk2Vhd (but I've not tried it myself) https://technet.microsoft.com/en-us/sysinternals/ee656415.aspx – Alex KeySmith Jul 13 '16 at 13:12
  • 1
    Do your methods contain multiple throw methods? This describes a gotcha with the .net framework using structured exception handling in windows. http://stackoverflow.com/a/2494150/141022 – Alex KeySmith Jul 13 '16 at 13:17
  • 2
    I think @AlexKeySmith's idea is a smart one. Copy the deployed application from your live environment to your test and see if it behaves the same. If the behavior is still the same (wrong line number) then you know it has something to do with your project / build config because the compiled code differs. If the behavior is now the same (line numbers are correct) then it is environment specific. This approach will narrow down where you have to look. – Igor Jul 13 '16 at 13:30
  • 2
    Side note - this is how our environments behave as well but it is expected because we compile with configuration `release` to our live servers and our build script forces flag 'Optimize code` so line numbers will never match. On our test servers we compile with configuration `debug` and the script disabled/removes flag Optimize code so line numbers are always matched in any stack trace. See [Common MSBuild Project Properties](https://msdn.microsoft.com/en-us/library/bb629394.aspx) flag `Optimize`. – Igor Jul 13 '16 at 13:36
  • Thanks for your suggestions! Those are pretty good ones. If you're reading this having a seemingly similar problem, the gotcha that Alex mentioned is a good place to start looking. Eventually we found out that it was the SCOM APM agent, which monitored the application on live servers, that caused the issue (see kopernik's answer). – timcbaoth Jul 14 '16 at 12:50

3 Answers3

26

Compilation optimization as well as runtime optimization can change the actual execution, as well as the call stack information constructed. So you cannot treat the numbers that seriously.

More information can be found in posts such as Scott Hanselman's: Release IS NOT Debug: 64bit Optimizations and C# Method Inlining in Release Build Call Stacks.

It would be too difficult to troubleshoot a specific case without touching your bits. But if you know of WinDbg you might dive deeper, by live debugging the application when the exception occurs. Then you can dump the actual jitted machine code as well as other runtime information, so as to determine how the line number is constructed.

Lex Li
  • 60,503
  • 9
  • 116
  • 147
  • 2
    > So you cannot treat the numbers that seriously. But this is basic debugability. We need the exact line numbers. Tools that do optimizaiton must emit some mapping so debugging and monitoring tools can find the right line number given a crash dump of sorts – Bruno Garcia Jun 25 '22 at 19:55
15

Thanks everyone for your help! We found out that the SCOM APM agent running on the machines in production caused our application to log wrong line numbers in the stack traces. As soon as we deactivated the agent, the line numbers were correct.

kopernik
  • 846
  • 1
  • 7
  • 10
  • It is a very interesting fact, but I guess this agent injects certain dlls into your process, which polluted the call stack and led to the offset. You might do some live debugging to confirm if you are familiar with tools such as WinDbg. – Lex Li Jul 21 '16 at 15:25
  • 1
    I would guess that the problem is stack unwinding. Inside SCOM you would be able to see any exception with correct(!) line numbers. So probably some process was injected that unwound the stack, leaving your application with an incomplete stacktrace. – timcbaoth Jul 26 '16 at 22:37
3

In latest .NET Core versions (aka .NET 6.0) if you have "Ready to run" JIT optimization enabled (PublishReadyToRun in publishing settings), this essentially renders line numbers inaccurate too.

Disabling it might help. But this comes as a performance trade obviously.

Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149