3

I am trying to understand Python behavior on this simple script:

I build a list, shuffle it, turn it into a set, then into a list again. Surprisingly the final list is sorted.

Let's see the code:

>> array = range(50)
>> array
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

>> import shuffle
>> shuffle(array)
>> array
[0, 32, 4, 35, 27, 40, 30, 7, 36, 24, 11, 26, 2, 45, 1, 41, 47, 5, 48, 18, 49, 20, 46, 44, 43, 31, 10, 42, 9, 13, 14, 15, 28, 29, 8, 6, 34, 19, 22, 33, 3, 17, 23, 38, 37, 16, 39, 21, 12, 25]

>> set(array)  # The data inside the set already seem to be sorted
set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])

>> list(set(array))  # The original array is sorted
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

Could you explain why my final list is sorted ? I thought a set was an unordered collection. What side effect am I missing here ?

Many thanks

Vineeth Sai
  • 3,389
  • 7
  • 23
  • 34
DareYang
  • 369
  • 2
  • 9
  • 4
    *Unordered* doesn't mean *random*. It has some deterministic way to hash and therefore order values. With your values, that appears to happen to be numerically sorted by pure coincidence. Probably because of interned integers and their memory locations. – deceze Sep 20 '18 at 09:17
  • 1
    https://stackoverflow.com/questions/32484953/why-does-a-set-of-numbers-appear-to-be-sorted – Vivek Sep 20 '18 at 09:18
  • It's an artifact of how `set`'s hash table works, and the fact that the hash of an integer is very simple: integers in the machine range (2**-31 to 2**31 on 32 bit CPython) hash to themself. Try it on this list: `[0, 35, 5, 40, 10, 45, 15, 20, 25, 30]` – PM 2Ring Sep 20 '18 at 09:34
  • @deceze The hash value of an integer is determined by the integer's value. It has nothing to do with its memory address. (OTOH, there are some situations where it can be useful for the `__hash__` method of an object to use the id of the object). The order in a set (or a pre-3.6 dict) depends on the items' hash value and the order of insertion (& deletion). – PM 2Ring Sep 20 '18 at 09:42

0 Answers0