8

when creating a set:

>>> falsey_set = {0, '', False, None}  # set([False, '', None])
>>> falsey_set = {False, '', 0, None}  # set([0,'', None])
>>> # adding an item to the set doesn't change anything either
>>> falsey_set.add(False)  # set([0,'',None])

or a dictionary, which mimics the behavior somewhat:

>>> falsey_dict = {0:"zero", False:"false"}  # {0:'false'}  # that's not a typo
>>> falsey_dict = {False:'false', 0:'zero'}  # {False: 'zero'} # again, not a typo
>>> falsey_set.add(())  # set([0,'', None, ()])
>>> falsey_set.add({})  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
>>> falsey_dict[()] = 'list'  # {False:'zero', ():'list'}
>>> falsey_dict({}) = 'dict'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'

0 and False always remove one another from the set. In the case of dictionaries they are incorrect altogether. Is there a reason for this? While I realize that booleans are derived from integers in Python. What's the pythonic reasoning for acting this way in the context of sets specifically (I don't care about dictionaries too much)? Since while useful in truthy comparison like:

>>> False == 0  # True

There is obvious value in differentiation:

>>> False is 0  # False

I've been looking over the documentation and can't seem to find a reference for the behavior

Update

@delnan I think you hit the nail on the head with hash determinism you've mentioned in the comments. As @mgilson notes both False and 0 use the same hashing function, however so do object and many of its subclasses (i.e.:super) that have identical hash functions. The key seems to be in the phrase Hashable objects which compare equal must have the same hash value from the documentation. Since, False == 0 and and both are hashable, their outputs must by Python's definition be equivalent. Finally, the definition of hashable states how sets use hashability in set membership with the following: Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally. While I still don't understand why they both use the same hashing function - I can settle with going this deep.

If we all agree then someone propose a polished answer, and I'll accept it. If there could be some improvement or if I'm off base then please let it be known below.

Marc
  • 4,820
  • 3
  • 38
  • 36
  • Why should the identity matter? Two equal strings can have different identities, imagine the pain if they were considered distinct by sets and dicts. Also note that by the definition of hashability, `hash(x) == hash(y)` **must** hold when `x == y`. –  Oct 09 '14 at 22:24
  • Related question: http://stackoverflow.com/questions/2764017/is-false-0-and-true-1-in-python-an-implementation-detail-or-is-it-guarante. – Roger Fan Oct 09 '14 at 22:30
  • 1
    You shouldn't get too hung up on the hashing part of this; it's mostly irrelevant. The simple answer is that dict and set (and list, come to that) membership is based on *equality* (unless you want to nitpick, in which case a better description is identity-then-equality), and since `False == 0`, they're considered the same when used as set elements or dictionary keys. The fact that dicts and sets use a hash table (and hence that keys and elements must be hashable, and equality should imply equality of hashes) is secondary. – Mark Dickinson Oct 10 '14 at 07:56
  • I agree that, simply put it's based on equality, ***except*** that you can't have a list, dict, or set inside a set because it throws this error ```TypeError: unhashable type:```. So saying that it's based on equality is a simplistic model but accurate enough for real use and, like you say, mostly irrelevant. Thanks for the reality check. – Marc Oct 11 '14 at 16:11

2 Answers2

3

It's because False and 0 hash to the same value and are equal.

The reason that they hash to the same value is because bool is a subclass of int so bool.__hash__ simply calls the same underlying mechanics that int.__hash__ calls...

>>> bool.__hash__ is int.__hash__
True
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • 1
    If they are equal and [hashable](https://docs.python.org/3/glossary.html#term-hashable), they *must* hash to the same value. @RogerFan –  Oct 09 '14 at 22:26
  • @delnan -- Not exactly. It's pretty easy to cook up a contrived example where two objects hash to the same value but aren't equal... – mgilson Oct 09 '14 at 22:28
  • I think arguing by the subclass relationship is putting the cart before the horse. `True` and `False` are intended to be the integers 1 and 0 with different `repr()`, that's why they are implemented as subclass of `int`. –  Oct 09 '14 at 22:29
  • 2
    "Must" in the sense of "really really should and is even explicitly required in the documentation", not in the sense of "it is enforced that". Also, you got it backwards: Equal hashes for different objects are just ordinary collisions and expected; the bad counter example would be equal objects with different hashes. –  Oct 09 '14 at 22:30
  • int and bool are different objects but they share the same ```__hash__``` function by identity: `````` but I think it would be much to ask for a counter example. – Marc Oct 11 '14 at 15:59
0

First, let's try to explain what's going on at the beginning, with your falsey_set and falsey_dict, so you see that it's not "incorrect", but in fact only possible consistent solution. To do so, we will remove bools from the picture temporarily, and use something that more people grasp intuitively: decimal numbers.

>>> numset = {3, 5, 3.0, 4}  # {3.0, 4, 5}
>>> numset.add(3)            # no change

I hope you agree that this is exactly how set should work. If you don't, then it seems that either you think 3 and 3.0 are not really equal, or you think that a set should be allowed to have equal elements. Neither of these are really productive beliefs IMO.

(Of course, which one of 3 and 3.0 ends up in the set is a matter of processing displays, and set is a bit weird since it is an atrophied dict where key and value are the same. But it is consistent and specified in Pythton. The point for now is, surely they cannot both be in a set.)

One more point: as you see, the thing I can add many other true things into my set (like 4 and 5) doesn't matter at all. Same, the fact you can add many other false things in your set (like '' and None) doesn't matter at all. Truth is a red herring. A set can have true elements and false elements. What is cannot have, is equal elements.

>>> numdict = {3:"a", 3.0:"b"}  # {3:"b"}

This looks weirder at a first glance, but is in fact much clearer what's going on, since keys and values are separate. Python rules are precise: read dict display from left to right, take every pair a:b, then if key a is already in the dict, update its value to b, else insert key a into the dict with value b.

With that algorithm, I guess it's obvious how the final dict ends up like that, and all the other behaviours you've noticed. What's important is that, like in a set, what you really need in a dict is to have only one value for any given key. Having two equal keys in the same dict would be an invitation to disaster, since then you'd be able to assign them different values.

So, in a nutshell: I think you dug yourself too deep with hash functions and other implementation stuff. These are nice way of seeing how Python does X, once you realize that X is the right thing to do. But first you have to see that X is the right thing to do. And I hope I've shown that to you now. A set cannot have equal elements. It would defeat a widely used purpose of a set, removing duplicates. And 3 and 3.0 really are equal. This has nothing to do with types, some embeddings are so natural we've erased them on a mathematical level.

Of course, that leaves the question "why are 0 and False really equal"? In fact, the answer is not very different: just another mathematically erased embedding that's so incredibly useful we would have to jump through many ridiculous hoops without it. For more about that, read about Iverson bracket. ;-) But anyway, it seems you know about that part. The above is what was problematic, I guess.

Veky
  • 2,646
  • 1
  • 21
  • 30