std::unordered_set insert hangs program when inserting a large number of nan values

Question

I am writing a unit test for something which ends up putting a vector of values into a set like so:

    std::size_t n = 1e5;

    float vec_value = 1;

    std::vector<float> data(n, vec_value);
    std::unordered_set<float> s;
    for (auto x : data) {
        s.insert(x);
    }

This works fine when the vector has a non-nan value. However, if the vector is nan, the program just hangs and never completes:

    std::size_t n = 1e5;

    float vec_value = std::numeric_limits<float>::quiet_NaN();

    std::vector<float> data(n, vec_value);
    std::unordered_set<float> s;
    for (auto x : data) {
        s.insert(x);
    }

even if I use:

std::unordered_set<float> s(data.begin(), data.end());

The program doesnt finish.

Ive put this minimal example into godbolt and I get the same behaviour and I dont know why this happens.

If i lower the number of points down to 1e3/1e4 then it works fine. As soon as I go 1e5 and above points, the program never finishes.

https://godbolt.org/z/nE4xv7G4q

`NaN` never compares equal to any value, even another `NaN`. That means every insertion will go into one bucket (they hash the same) and that bucket is a linked list. I wouldn't be surprised if that's taking up a good bit of time. — NathanOliver, Mar 10 '23 at 15:02
And in the (only) bucket equal values are supposed to be grouped to support [equal_range](https://en.cppreference.com/w/cpp/container/unordered_set/equal_range) for matching keys. Not comparing equal to anything doesn't play well with that. Perhaps O(n^2)? — BoP, Mar 10 '23 at 15:11
It would probably finish if you only had the patience to wait. If a new "unequal" number is added last to the list in the only bucket, the complexity is O(1+2+3+...+N) = O(n^2). Making a wild and rough approximation of how much time it could take, 100000 * 100000 microseconds is ≈ 2.8 hours. — molbdnilo, Mar 10 '23 at 15:20
@NathanOliver That was the issue, thanks. Rewrote the way I did my tests and will be careful about that in the future. — Tom McLean, Mar 10 '23 at 15:27
@molbdnilo The actual number of values is 1e6/1e7 so it will be much, much longer. — Tom McLean, Mar 10 '23 at 15:29
Floats as keys sounds risky even if @MarekR had a very nice workaround for your problem. You can't expect that two floating point variables that _should_ mathematically compare equal actually does. — Ted Lyngmo, Mar 10 '23 at 16:01
@TedLyngmo The floating point values should be exactly, exacly, the same — Tom McLean, Mar 10 '23 at 16:02
@TomMcLean I'm not sure I understand what you mean, but if you have two different ways of calculating something that should (logically/mathematically) result in exactly the same thing, that's not something you can trust with floating point variables in C++. They will always be "close" but not always compare equal. — Ted Lyngmo, Mar 10 '23 at 16:05
@TedLyngmo It was a pretty bad test in retrospect, but it was doing a nework call to a service where you give it some times, latititudes and longitudes and it returns the weather data for that point. The times, latittudes and longitudes were all the same points so the data it recieved back should also be all the same. If the time point was outside the weather data range, it would return nan, hence the bug I was having. The right thing to do would of probably to extract that part of the code as a function and test it without a network call, but got to make something work at the end of the day — Tom McLean, Mar 10 '23 at 16:10

std::unordered_set insert hangs program when inserting a large number of nan values

0 Answers0