30

Consider the ordered and unordered associative containers in C++ keyed on double.

Is NaN a valid key type?

With ordered containers, I should say "no", because it does not respect the strict weak ordering.

With unordered containers, I have no idea.

Here's what happens in GCC 4.6.2:

#include <map>
#include <unordered_map>

#include <cmath>

#include <iostream>
#include <prettyprint.hpp>

int main()
{
  typedef std::map<double, int> map_type; // replace by "unorderd_map"

  map_type dm;
  double d = std::acos(5); // a good nan

  dm[d] = 2;
  dm[d] = 5;
  dm[d] = 7;

  std::cout << "dm[NaN] = " << dm[d] << ", dm = " << dm << std::endl;
}

For the ordered map, I get:

dm[NaN] = 7, dm = [(nan, 7)]

For the unordered map, I get:

dm[NaN] = 0, dm = [(nan, 0), (nan, 7), (nan, 5), (nan, 2)]

So in the ordered map, all NaNs are treated the same, which is what I expect, although it seemed like NaN would violate the requirements. For the unordered map, however, I can never retrieve an element again, and all NaNs are different. This is also not what I would expect.

Does the standard have to say anything on this matter?

Update: Thanks to the great answers below, note that the std::map will break if you insert anything else into it once a NaN is in it.

(I'd be very grateful for comments on how other languages handle floating point keys in associative containers.)

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 1
    The Standard can't really have anything to say about NaNs specifically, because a NaN is an IEEE concept. In other words, its platform-dependant. – John Dibling Nov 11 '11 at 16:24
  • 1
    @JohnDibling: That's a good point, although the standard recognizes that numerical types *may* have a "nan" value; as in ``. – Kerrek SB Nov 11 '11 at 16:27
  • > Consider the ordered and unordered associative containers in C++ keyed on double. > Is NaN a valid key type? Wrong question. The correct question is: which possible **domain of the builtin less-than** contains NaN? – curiousguy Dec 26 '11 at 03:54

3 Answers3

18

They are both forbidden by the standard.

For the (ordered) associative containers, the definition of strict weak order (25.4/4) says:

If we define equiv(a, b) as !comp(a, b) && !comp(b, a), then the requirements are that comp and equiv both be transitive relations ... equiv(a, b) && equiv(b, c) implies equiv(a, c)

This fails for a = 0.0, b = NaN, c = 1.0, comp = std::less<double>()

For the unordered containers, 23.2.5/3 says that the equality predicate Pred "induces an equivalence relation on values of type Key". Equivalence relations are reflexive, and std::equal_to<double>()(NaN,NaN) is false, so equal_to<double>() is not an equivalence relation.

By the way, keying containers on a double is slightly scary in the same way that comparing doubles for equality is always slightly scary. You never know what you're going to get in the least significant bit.

Something I've always considered a little odd is that the standard expresses the requirements in terms of the key type, not in terms of the actual key values added to the container. I believe you could choose to read this as not guaranteeing that map<double, int> has defined behaviour at all if the implementation supports NaNs, regardless of whether you actually add a NaN to an instance or not. In practice, though, an implementation of std::map cannot somehow conjure a NaN out of its back pocket and try to compare it, it only ever compares key values passed to the instance. So it should be OK (if slightly scary) provided you avoid adding NaNs.

I'd be very grateful for comments on how other languages handle floating point keys in associative containers

A few quick experiments in Python (where set and dict are unordered associative containers which hold keys and values by reference) suggest that NaNs are treated as objects that are unequal in value even if they're "the same NaN", but the same nan object can be found again by identity. As far as I've seen, the containers don't seem to be disturbed by containing multiple nans, or a mixture of nans and other values:

>>> thing = set()
>>> nan = float('nan')
>>> nan
nan
>>> thing.add(nan)
>>> thing.add(nan)
>>> thing
set([nan])

>>> thing = dict()
>>> thing[nan] = 1
>>> thing[nan] = 2
>>> thing[nan]
2
>>> nan2 = float('nan')
>>> thing[nan2] = 3
>>> thing
{nan: 2, nan: 3}

>>> thing = set()
>>> thing.add(nan)
>>> thing.add(nan2)
>>> thing
set([nan, nan])

>>> thing = dict()
>>> thing[nan] = 1
>>> thing[nan2] = 2
>>> thing[0] = 3
>>> thing
{nan: 1, nan: 2, 0: 3}
>>> thing.keys()
[nan, nan, 0]
>>> thing.values()
[1, 2, 3]
>>> thing[0]
3
>>> thing[1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 1
Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • The authoritarian tone of the answer evokes a certain "accept" reflex, but I shall wait a bit longer :-) – Kerrek SB Nov 11 '11 at 16:42
  • 3
    One thing I've left out is that if `NaN` is the only value that you ever add to your `map`, then `less()` is a strict weak order on that value alone. It's not hard to write a function that works as a SWO on a domain of size 1, the only requirement is to return `false`. In fact I think it takes a NaN and *two* distinct normal values before it fails: with NaN and one value it should just behave as though `NaN` is equivalent to that value. But such maps probably aren't very interesting... – Steve Jessop Nov 11 '11 at 17:01
  • 1
    @Steve: I think your two-element example behaves as if `NaN` is _equal_ to the other value. (Neither is less than the other => they are eqivalent.) Which means you will never actually see the other element in the map :-) – Nemo Nov 11 '11 at 17:07
  • @Nemo: agreed, in fact I corrected the comment but I guess we crossed in the mail. – Steve Jessop Nov 11 '11 at 17:33
  • That Python example is just plain weird. I can't find any sensible mental model in which that behaviour makes sense. – Kerrek SB Nov 11 '11 at 23:42
  • @Kerrek: the mental model is that the equivalence relation used by Python in the hash table is something like `x == y or x is y`. So two different calls to `float('nan')` produce different keys, but any object is always the same key when seen again. – Steve Jessop Nov 13 '11 at 18:22
  • "_if NaN is the only value that you ever add to your map_" Once you have that set/map with a NaN in it, you cannot add any other float value, because this is actually the same value WRT to `<`. So if you have a set/map with **one** element, and its key is NaN, you can try to insert as many other values as you want! – curiousguy Dec 26 '11 at 01:04
  • Actually, if you read [alg.sorting]: "If we define equiv(a, b)"... there is no explanation about where `a` and `b` are coming from... there are just there. This is the funny std style, and it's why nobody serious is taking the std too seriously. Then in [associative.reqmts]: "For any two keys k1 and k2 **in the same container**, calling comp(k1, k2) shall always return the same value." which obviously is insufficient... – curiousguy Dec 26 '11 at 03:37
  • The half-C++ half-math description used in the std is quite comical: "for all `x`(sic)" ROTFL! Anyway, the writing style is soooo heavy and barely readable, I suppose nobody is reading this as a "specification". – curiousguy Dec 26 '11 at 03:38
5

This is because

std::less<double>(NaN, NaN) == false

Like you said, the weak total ordering (required for std::map<>) is ok, the equality (or equivalence, extra requirement for any hash-based container) isn't ok to satisfy the key requirements for the hash (unordered) map

Irreflexivity   f(x, x) must be false. 
 Antisymmetry   f(x, y) implies !f(y, x) 
 Transitivity   f(x, y) and f(y, z) imply f(x, z). 
 Transitivity of equivalence     

         Equivalence (as defined above) is transitive: if x 
         is equivalent to y and y is equivalent to z, then x is 
         equivalent to z. (This     implies that equivalence does 
         in fact satisfy the mathematical definition of an equivalence 
         relation.)

Seeing that for std::map, equivalence is when !less(a,b) && !less(b,a), I'd say all constraints are met.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Yes, I appreciate that, but is this behaviour documented or commented on somehow in the standard? Also, I wouldn't say that "the ordering is OK", since NaN does *not* obey a SWO: it is *not* one of { less, greater, equal } when compared to anything. – Kerrek SB Nov 11 '11 at 16:18
  • 1
    @KerrekSB see [SGI std::map](http://www.sgi.com/tech/stl/Map.html) and click through to [SGI Strict Weak Ordering](http://www.sgi.com/tech/stl/StrictWeakOrdering.html#1) page – sehe Nov 11 '11 at 16:24
  • 9
    The part of SWO that NaN values fail is the transitivity of equivalence. `0 < NaN` and `NaN < 0` are both false. `1 < NaN` and `NaN < 1` are both false. Therefore, `0 < 1` and `1 < 0` should both be false, but they aren't. Hence the map has undefined behavior as soon as you put a NaN in it. Arguably it even has UB earlier, since the standard says that the order must be a SWO on the key *type*, not just on the key values actually added to the map, but that's quite a fine point. – Steve Jessop Nov 11 '11 at 16:26
  • Hmm... I see: Given that NaN is never *less* than anything, those conditions are vacuously satisfied. OK, that's fine. So for `map`, `double` is a perfectly legitimate key type. – Kerrek SB Nov 11 '11 at 16:27
  • @Kerrek: no, listen to me, not Billy and sehe! ;-) – Steve Jessop Nov 11 '11 at 16:28
  • @SteveJessop: I am, I am -- well spotted on the transitivity! – Kerrek SB Nov 11 '11 at 16:29
  • 1
    @Steve: Yes, I screwed that one up. Kerrek: Your test will fail if you try to insert anything *other than* a NaN into the map at this point -- it'll be treated as equivalent to the NaN and therefore won't be inserted. – Billy ONeal Nov 11 '11 at 16:29
  • Remember, a strict weak order is actually a **total** order. – curiousguy Dec 26 '11 at 03:51
  • SGI does not define the C++ standard. – Lightness Races in Orbit Dec 18 '12 at 09:44
3

NaNs can be stored inside a map -- namely, they are copy constructable and less than comparable. std::less for doubles doesn't meet map's requirements for a strict weak ordering, so you've technically got undefined behavior here. However, the behavior is easily explained, even if not required by the standard. Map uses equivalence rather than equality to determine whether an item is a duplicate. Two NaNs compare equivalent, but not equal. However, that falls apart in some cases. For example, if you tried to insert something other than a NaN into that map, it would be treated as equivalent to the NaN and you'd get no insert. Try adding some real numbers in addition to the NaNs and you can see map breaks down too.

The hash behavior is expected but not defined for a hash table as well -- hash tables require their contents to be copy construcable and equality comparable. The hashes of multiple NaNs compare equal, so they're all going to go into the same bucket, but a hash table uses equality comparison rather than less than comparison (equality rather than equivalence). Therefore, none of the NaNs compare equal to each other and you get multiple inserts for that key. This is because NaN breaks the hash table's equality comparable requirement -- namely the invariant that std::equal_to(x, x) true.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • Yeah, cheers... I can see all those things; I was just wondering if that's somehow an acknowledged corner of the standard; or if every compiler vendor should be saying, "don't use NaN values in our associative containers". – Kerrek SB Nov 11 '11 at 16:30
  • @KerrekSB: I think in map's case you've technically got undefined behavior because of the SWO issue mentioned in the other answer. I think the hash table's behavior is actually well defined. I don't think the standard says any of this explicitly; it just comes along for the ride in the requirements of hashable, equality comparable, less than comparable, and strict weak ordering. – Billy ONeal Nov 11 '11 at 16:32
  • @KerrekSB: Put another way, I can't think of any conforming implementation of map and unordered_map having behavior any different than you show in your test, despite the standard not going into that explicitly. – Billy ONeal Nov 11 '11 at 16:33
  • I was thinking that you'd have to provide a different predicate functor which inherits `equal_to` trivially, but is specialised to numeric types with NaN and compares those equal... – Kerrek SB Nov 11 '11 at 16:36
  • The problem with NaN in a hash table is that lookups always return "false". So you can insert the element, but you will never find it... Are you sure `unordered_map` does not require `operator==(x,x)` to be true? – Nemo Nov 11 '11 at 16:36
  • @KerrekSB: That should work. Of course, you're kind of changing the semantics of NaN by doing that. – Billy ONeal Nov 11 '11 at 16:37
  • @Nemo: Yep, you're right. Then both types of containers are undefined. – Billy ONeal Nov 11 '11 at 16:37
  • @Nemo and KerrekSB: I've significantly updated this answer to correct it. :) Thanks for the counterexamples! – Billy ONeal Nov 11 '11 at 16:41
  • "_Map uses **equivalence** rather than **equality** to determine whether an item is a duplicate._" Std uses both terms actually. Anyway, it's just a matter of choosing a word, and using it consistently. It may be easier (less confusing) to talk about equality of keys and total ordering, than equivalence and strict weak ordering. – curiousguy Dec 26 '11 at 03:49
  • @curiousguy: As far as I am aware, the standard does not use both terms. Equality is specific; it means `a == b`. Equivilence is something else entirely; it's `!(a < b || b < a)`. Can you find a portion of the standard violating this? (In C++03; I don't have a copy of 0x to use :( ) – Billy ONeal Dec 27 '11 at 16:21
  • @BillyONeal "_As far as I am aware, the standard does not use both terms_" Even if the std doesn't say "equal", "equivalent" is not used consistently. In the explanation of `set`/`multiset`..., "_is an associative container that supports unique keys (contains at most one of each key value)_" These words (value, unique) imply equality: two values are the same iff they are equal, not equivalent. – curiousguy Feb 18 '12 at 05:32
  • @curiousguy: I don't think that makes any sense. As far as set and multiset are concerned, everything is done via equivalence, not equality -- after all, there's no way to make these containers respect `operator==` or any kind of equality functor. Unique as far as the container is concerned is equivalent, not equal. – Billy ONeal Feb 18 '12 at 06:06
  • @BillyONeal "_there's no way to make these containers respect operator== or any kind of equality functor_" Obviously. I never mentioned any of those. Equality of keys means mathematical equality, not a C++ function with 2 arguments. "Unique" means that only one key with mathematically equal value is allowed in the container. I see no problem here: equality is the same equivalence, as defined according to `comparator`. There is no other way to defined "equal keys". – curiousguy Feb 18 '12 at 06:25
  • @BillyONeal "_Equality is specific; it means a == b._" Two strings may compare equal with `operator==`, that doesn't mean they are equal objects, because equality of objects is not well defined (and arguably two objects are only equal if they have the same address). We say that these string objects have "the same value", where "value" means what is compared by `operator==`. – curiousguy Feb 18 '12 at 06:29
  • @curiousguy: No, we don't. Equality means `std::equal`, which means `operator==` if defined for the type in question. There are no equal objects in C++ because everything is a value type. – Billy ONeal Feb 19 '12 at 03:32
  • @BillyONeal "_There are no equal objects in C++ because everything is a value type._" I am not sure what you mean. – curiousguy Mar 05 '12 at 02:20