19

I have a system that starts producing incorrect values after running for several hours. I reproduced it while under a debugger and found that the problem is System.Math.Round begins returning incorrect values.

I have two instances of the same version of Visual Studio, running side by side on the same machine, with the same project, the same code, at the same part of the stack trace -- everything is identical -- except one has been running for hours and has begun failing, the other hasn't.

I execute a constant expression in their respective Immediate windows, and get different values.

In the good run:

enter image description here

In the bad run:

enter image description here

This small discrepancy has significant implications for my app.

.NET version, dumped from the running code:

System.Environment.Version => 4.0.30319.42000

(typeof(string).Assembly.GetCustomAttributes(typeof(AssemblyFileVersionAttribute), false))[0] => 4.8.4644.0

Has anyone seen this before? Is it a known bug? Is there a way I can workaround it?


EDIT: @Kit doesn't trust the Immediate Window so here's a bit more info. I showed the Immediate Window result because it lets you see that the same constant expression is producing different results from Math.Round. Below is the line in the actual code where it's relevant, and you can see that Math.Round is producing the wrong value in the actual code, too:

enter image description here

Cole Tobin
  • 9,206
  • 15
  • 49
  • 74
Mud
  • 28,277
  • 11
  • 59
  • 92
  • **Comments have been [moved to chat](https://chat.stackoverflow.com/rooms/254291/discussion-on-question-by-mud-system-math-round-corruption); please do not continue the discussion here.** Before posting a comment below this one, please review the [purposes of comments](/help/privileges/comment). Comments that do not request clarification or suggest improvements usually belong as an [answer](/help/how-to-answer), on [meta], or in [chat]. Comments continuing discussion may be removed. – Dharman Jun 28 '23 at 18:30
  • 1
    So you have a const expression (something like `const double val = 176.56397878397178;`) and you call `Math.Round` on it and get two different results? If you `Debug.WriteLine($"{val:F20}");`, what does it look like? – Flydog57 Jun 28 '23 at 18:33
  • @Flydog57 Yes. Not even using a const variable, I'm using a float *literal* in the expression. In the actual code, it's a variable (I added a screenshot to my post). I used a literal in the Immediate Window so you could see for yourself that it's not the input that's changing, it's the output. – Mud Jun 28 '23 at 19:39
  • 14
    The bitness of your process matters a lot, I'll guess at 32-bit (aka x86). In which case Round() is implemented by the FRNDINT fpu instruction. Whose behavior is affected by the rounding mode selected in the FPU control register. Use Debug > Windows > Registers, right-click that tool window and tick "Floating point". A .NET program must always be operating with CTRL = 027F. Step through your program and when you see it change then you found the Evil Code. A simple way to reset the control register is by intentionally throwing an exception and catching it. – Hans Passant Jun 28 '23 at 19:57
  • 5
    @HansPassant Amazing! The good instance has `CTRL = 027F`, the bad has `CTRL = 067F`! Do you know what could cause that rounding mode to change? It's untenable to "step through the code" to find where it changes. It tens of thousands of lines of code, iterating through tens of thousands of records and performing the same operation on each of them, and only when it's run for hours (or one particular bad record is hit, possibly), that suddenly we get into this state. BTW I modified a value in the bad instance to cause an exception. After being caught and handled, I still have CTRL = 067F. – Mud Jun 28 '23 at 20:44
  • 7
    Yes, 067F means it changed from "round to nearest" to "round down". Definitely the cause. I can't help you find it without anything to look at. Suspect a library that pinvokes to native code. How to pinvoke _controlfp() to restore the control word is described [here](https://stackoverflow.com/questions/18811466/how-do-i-force-the-c-sharp-compiler-to-throw-an-exception-when-any-math-operatio). – Hans Passant Jun 28 '23 at 21:44
  • 8
    @HansPassant: you should summarize what you suggested and what Mud found out as an answer. This will be a damn fine SO question with that answer. – Flydog57 Jun 29 '23 at 02:53
  • 2
    related : "some libraries ... change [rounding] and sometimes (or always) fail to restore it" https://stackoverflow.com/a/10343425/1462295 || "changing the rounding mode is sticky" https://stackoverflow.com/a/64920804/1462295 || "if somebody changes the per-thread precision settings your results may be rounded to a different precision than expected" https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/ . maybe related: https://stackoverflow.com/a/25206025/1462295 – BurnsBA Jun 29 '23 at 13:49
  • Epic found @HansPassant. Instead of searching for the evil, maybe just write a helper method that sets the value correctly and call it right before your `.Round()` call. – Oliver Jun 29 '23 at 15:11
  • @Oliver That's what I'm doing. Fixed my problem (and just in time). – Mud Jun 29 '23 at 15:44
  • 3
    @HansPassant You saved my ass! Post it as an answer and I'll accept it, otherwise I'll post an answer on your behalf. – Mud Jun 29 '23 at 15:45
  • 5
    @Tudeschizieuinchid *This* is why it's reasonable to ask a question without a reproduction case. That would have been *literally impossible* in this case, but I had a strongly isolated and very unusual *symptom* that I wanted to put in front of the eyes of experts. One such expert identified the root cause and fixed my problem. This question and answer would have saved me days if it was here when I started. That's what this site is for. – Mud Jun 29 '23 at 15:49
  • 3
    @HansPassant - You are a steely-eyed .NET man. – Enigmativity Jun 29 '23 at 23:41

1 Answers1

13

@HansPassant identified the problem in the comments:

Hans: "The bitness of your process matters a lot, I'll guess at 32-bit (aka x86). In which case Round() is implemented by the FRNDINT fpu instruction. Whose behavior is affected by the rounding mode selected in the FPU control register. Use Debug > Windows > Registers, right-click that tool window and tick "Floating point". A .NET program must always be operating with CTRL = 027F. Step through your program and when you see it change then you found the Evil Code."

This is exactly right. It's a 32 bit .NET app, which apparently means it uses the FPU rather than SSE instructions. These were the floating point registers in the good instance vs the bad:

enter image description here

enter image description here

Hans: "067F means it changed from 'round to nearest' to 'round down'."

This code base only needed to run successfully once before being retired, so I didn't try to find which unmanaged dependency was changing this flag and when. Instead, I just added something like this to my app and called it before doing important bits of work:

    [DllImport("msvcrt.dll")]
    private static extern int _controlfp(int IN_New, int IN_Mask);

    public static void VerifyFpuRoundingMode()
    {
        const int _MCW_RC  = 0x00000300;
        const int _RC_NEAR = 0x00000000; 
        int ctrl = _controlfp(0, 0);
        if ((ctrl & _MCW_RC) != 0)
        {
            _controlfp(_RC_NEAR, _MCW_RC);
        }
    }

This fixed our rounding issues.

Mud
  • 28,277
  • 11
  • 59
  • 92