I just stumbled across this interesting behavior of Python involving NaN
's in set
s:
# Test 1
nan = float('nan')
things = [0, 1, 2, nan, 'a', 1, nan, 'a', 2, nan, nan]
unique = set(things)
print(unique) # {0, 1, 2, nan, 'a'}
# Test 2
things = [0, 1, 2, float('nan'), 'a', 1, float('nan'), 'a', 2, float('nan'), float('nan')]
unique = set(things)
print(unique) # {0, 1, 2, nan, nan, nan, nan, 'a'}
That the same key nan
shows up multiple times within the last set
of course seems strange.
I believe this is caused by nan
not being equal to itself (as defined by IEEE 754), together with the fact that objects are compared based on memory location (id()
) prior to equality of values, when adding objects to a set
. It then appears that each float('nan')
results in a fresh object, rather than returning some global "singleton" float
(as is done for e.g. the small integers).
- In fact I just found this SO question describing the same behavior, seemingly confirming the above.
Questions:
- Is this really desired behavior?
- Say I was given the second
things
from above. How would I go about counting the number of actually unique elements? The usuallen(set(things))
obviously does not work. I can in fact useimport numpy as np; len(np.unique(things))
, but I would like to know if this can be done without using third-party libraries.
Addendum
As a small addendum, let me add that a similar story holds for dict
s:
d = {float('nan'): 0, float('nan'): 1}
print(d) # {nan: 0, nan: 1}
I was under the impression that NaN
's were a total no-go as keys in dict
s, but it does actually work out as long as you store references to the exact objects used as keys:
nan0 = float('nan')
nan1 = float('nan')
d = {nan0: 0, nan1: 1}
d[float('nan')] # KeyError
d[nan0] # 0
d[nan1] # 1
Surely hacky, but I can see this trick being useful if one is in need of storing additional values in an existing dict
, and one does not care about which keys to use, except of course that each new key has to not be in the dict
already. That is, one can use float('nan')
as a factory for generating an unending supply of new dict
keys, guaranteed to never collide with each other, existing or future keys.