-1

Note: The code works in Windows 11 (22000.1098) and earlier but causes stack overflow exception on Windows 11 (22621.525)

I have a bug that is causing me massive problems in a C# program I'm in charge of. It works well in earlier Windows versions (including earlier Windows 11). It also works in debug builds. But it throws an exception in release build. I have drilled down to the individual functions where the problem appears and it´s weird.

The code is something like

MySettings.MySetting setting = new MySettings.MySetting()
{
    Value = Double.NaN,
    Values = new double[] { Double.NaN },
    Special = "" }
});

but if I change the Double.NaN to a numeric value the code works.

I tried to desimplify it as

MySettings.MySetting setting = new MySettings.MySetting();
set1.Value = Double.NaN;
set1.Values = new double[] { Double.NaN };
set1.Special = "";

Removing Special changes nothing, but if either Value or Values are NaN it throws exception.

The settings class in minimal form is

public class MySetting
{
    public Double Value { get; set; }
    public string Special { get; set; }
    public Double[] Values { get; set; }
    public ValueSpecial[] Specials { get; set; }

    public ValueSpecial AddSpecial(string code, string value)
    {
        ValueSpecial special = new ValueSpecial() { Code = code, Value = value };
        return special;
    }

    public void ForgetSpecial(string code)
    {
    }
    
    public override string ToString()
    {
        return "Not today";
    }
}

ValueSpecial is very simple

public class ValueSpecial
{
    public string Code { get; set; }
    public string Value { get; set; }

    public override string ToString()
    {
        return "dummy";
    }
}

The call stack looks very innocent. Main() -> MainForm() -> InitializeComponent() -> MyControl() so it doesn't appear to be any recursion going on.

What has Microsoft changed in this version of Windows and how do I get around it?

Please note that the same binary works on Windows 11 22000.1098 so don't just focus on the code.

It looks like https://stackoverflow.com/a/25208200/1771388 may give the answer.

liftarn
  • 429
  • 3
  • 21
  • 1
    What's the definition of `MyItem` ? – Matthew Watson Oct 17 '22 at 11:01
  • 1
    Which threw the exception, `Add`, `Value` or `Values`? – shingo Oct 17 '22 at 11:01
  • Maybe it has to do with checked/unchecked statements. See : https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/checked-and-unchecked?force_isolation=true – jdweng Oct 17 '22 at 11:06
  • And what is the definition of `MySetting`? – Klaus Gütter Oct 17 '22 at 11:10
  • @MatthewWatson I removed it and it still throws exception so it was innocent. I have updated the example. – liftarn Oct 17 '22 at 11:59
  • @shingo Add() was innocent. Value or Values throws exception if set to NaN. Normal numbers work. – liftarn Oct 17 '22 at 12:00
  • Please provide a [mcve] - we shouldn't have to guess about *anything* here. – Jon Skeet Oct 17 '22 at 12:03
  • @JonSkeet I have done so. – liftarn Oct 17 '22 at 12:11
  • @jdweng That appears to be for arithmetic overflows, not stack overflows. I also tested and it made no difference. – liftarn Oct 17 '22 at 12:17
  • Are `Value` and `Values` really auto implemented properties? – shingo Oct 17 '22 at 12:28
  • @shingo As far as I can tell. But the interesting is that it works in earlier versions of Windows. But they may have added some error checks that caught something. – liftarn Oct 17 '22 at 12:35
  • 3
    https://stackoverflow.com/questions/25205112/testing-for-a-float-nan-results-in-a-stack-overflow – Hans Passant Oct 17 '22 at 12:37
  • @HansPassant Yes, I suspected that, but all are Double.IsNaN() and a few Number.CheckIfNaN(); – liftarn Oct 17 '22 at 12:45
  • 1
    Per Hans, this may well be an unintended change of the FPU control word causing NaNs to throw exceptions. To further zoom in on this problem you'd need to use *clean* installations of Windows and run your release build standalone, because it's likely a third-party component spoiling the party (one that may unceremoniously inject itself into every application, even). It's not impossible that it's truly Windows alone that's responsible (with it providing the component that is twiddling FPU settings), but it is less likely. – Jeroen Mostert Oct 17 '22 at 12:50
  • This definitely isn't a [mcve] - it doesn't include the `ValueSpecial` type. We should be able to copy/paste/compile/run the code without *anything* else. – Jon Skeet Oct 17 '22 at 12:53
  • @JonSkeet Well, it still needs a compiler, the .Net SDK and the entire MS Windows. – liftarn Oct 17 '22 at 13:00
  • 1
    That can't be correct, nobody ever expects their code to fail like this. It has nothing to do with settings or the presence of NaN or the OS, the problem is caused by a library you use that destabilizes the runtime. You'll have to find it, before you can eliminate or repair it, and that requires using the debugger as I described in the linked post. – Hans Passant Oct 17 '22 at 13:09
  • @JeroenMostert It was tested it on a rather, but not completely blank laptop that is used for this type of testing. I will look into if I can get a completely blank install. – liftarn Oct 17 '22 at 13:09
  • @HansPassant Yes, as Douglas Adams wrote "The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at and repair." – liftarn Oct 17 '22 at 13:12
  • This code still doesn't compile as-written, but after getting it to the stage where it *will* compile, it doesn't cause any problems for me: https://gist.github.com/jskeet/f514e46c2f0d39073527617dd13cdbda (Run with `dotnet run -c Release`, using .NET 6.0, on Windows 11 build 22621.674. We don't know which platform you're targeting.) – Jon Skeet Oct 17 '22 at 13:14
  • As for "Well, it still needs a compiler, the .Net SDK and the entire MS Windows" - comments like that when you haven't included *all of your source code* really don't help your case. – Jon Skeet Oct 17 '22 at 13:15
  • @JonSkeet I have tried both .NET 4.0 and 4.8 with no change in behaviour. – liftarn Oct 17 '22 at 13:17
  • Just tried with both of those, and it's still fine. Of course we *also* don't know what kind of app this is - I've just tested with a console app, but I guess it *could* make a difference if it's WPF or WinForms... (If you can reproduce with a console app, and confirm that in the question, along with the information about the framework and ideally a project file, that would help.) – Jon Skeet Oct 17 '22 at 13:19
  • @JonSkeet I tried a minimal WinForms and it works. – liftarn Oct 17 '22 at 13:32
  • "Works" as in "reproduces the problem" or "works" as in "runs without issue"? (We really shouldn't have to ask this many questions just to find a way of reproducing the problem...) – Jon Skeet Oct 17 '22 at 14:02
  • @JonSkeet The minimal program runs without exceptions. Well, it's not a trivial problem. – liftarn Oct 17 '22 at 14:06
  • You have a 64 bit floating point number that is defined by IEEE 754. There are two NAN values according to the binary paragraph at following : https://en.wikipedia.org/wiki/IEEE_754?force_isolation=true – jdweng Oct 17 '22 at 14:13
  • Okay, so it's not actually an [mcve] in that it doesn't reproduce the problem. Your earlier update give me the impression that you *had* reproduced the issue with the code you'd presented. Until you can provide a way of reproducing the problem, I can't see how anyone is going to be able to help you. – Jon Skeet Oct 17 '22 at 14:52
  • @JonSkeet As it's only appearing on a specific version of Windows it's quite possible it's not directly associated with the code at all. – liftarn Oct 18 '22 at 06:26
  • Until you can find that out, we really can't help you. Your next step should be to create a genuine minimal example that at least fails for you on the same system that the full app fails. At that point, others can try to see if it fails on their system too. But "here's some code which doesn't demonstrate the problem that my app has" doesn't help anyone. – Jon Skeet Oct 18 '22 at 06:38
  • @JonSkeet I have all the time been very open with that it's not the code that's the problem as it works on other version of Windows and in debug mode. I'm trying to debug Windows. – liftarn Oct 18 '22 at 07:09
  • But you *haven't* been open with "I can't reproduce the problem in a minimal example." Indeed, when I asked for a repro, you edited the post and said "I have done so." I'm done here; I really can't see this being productive. – Jon Skeet Oct 18 '22 at 08:08

1 Answers1

0

Not a fix as such but sort of a workaround. The problem happens in the FPU and may be caused by older programs or libraries. Thy it appeared in Windows 11 (22621.525) but not in earlier versions may have to do with changes of how Windows works.

Anyway, you can get around the problem by forcing a reset of the PFU by adding

try
{
    throw new Exception("Please ignore, resetting FPU");
}
catch {}

before you call the function that causes the exception. If you need to do it from more than once place it may be a good idea to wrap it into a function.

liftarn
  • 429
  • 3
  • 21