One key thing that's hinted at mgilson's great answer, but isn't mentioned explicitly in any of the existing answers:
Small integers hash to themselves:
>>> [hash(x) for x in (1, 2, 3, 88)]
[1, 2, 3, 88]
Strings hash to values that are unpredictable. In fact, from 3.3 on, by default, they're built off a seed that's randomized at startup. So, you'll get different results for each new Python interpreter session, but:
>>> [hash(x) for x in 'abcz']
[6014072853767888837,
8680706751544317651,
-7529624133683586553,
-1982255696180680242]
So, consider the simplest possible hash table implementation: just an array of N elements, where inserting a value means putting it in hash(value) % N
(assuming no collisions). And you can make a rough guess at how large N
will be—it's going to be a little bigger than the number of elements in it. When creating a set from a sequence of 6 elements, N could easily be, say, 8.
What happens when you store those 5 numbers with N=8? Well, hash(1) % 8
, hash(2) % 8
, etc. are just the numbers themselves, but hash(88) % 8
is 0. So, the hash table's array ends up holding 88, 1, 2, NULL, NULL, 5, NULL, 7
. So it should be easy to figure out why iterating the set might give you 88, 1, 2, 5, 7
.
Of course Python doesn't guarantee that you'll get this order every time. A small change to the way it guesses at the right value for N
could mean 88
ends up somewhere different (or ends up colliding with one of the other values). And, in fact, running CPython 3.7 on my Mac, I get 1, 2, 5, 7, 88
.0
Meanwhile, when you build a hash from a sequence of size 11 and then insert randomized hashes into it, what happens? Even assuming the simplest implementation, and assuming there are no collisions, you still have no idea what order you're going to get. It will be consistent within a single run of the Python interpreter, but different the next time you start it up. (Unless you set PYTHONHASHSEED
to 0
, or to some other int value.) Which is exactly what you see.
Of course it's worth looking at the way sets are actually implemented rather than guessing. But what you'd guess based on the assumption of the simplest hash table implementation is (barring collisions and barring expansion of the hash table) exactly what happens.