0

I know to remove duplicates in a list...just curious to know why set does not give order as orginal list

my_list = ['apple', 'mango', 'grape', 'apple', 'guava', 'pumpkin']
>>>[*set(my_list)]

#output:
>>> ['mango', 'apple', 'grape', 'guava', 'pumpkin']
>>> ['pumpkin', 'guava', 'grape', 'mango', 'apple']
Ramesh
  • 635
  • 2
  • 15

1 Answers1

1

As all the comments say, a set is unordered, always.

But internally it uses a hash table, and IIRC the values stored are the hash of the object modulo the table size. Now small integers tend to have themselves as their hash values, so you may have the impression that they are sorted (not ordered by insertion order), but this won't always be the case:

ls = [1,2,3]
[*set(ls)]
[1, 2, 3]

ls = [2,1,3]
[*set(ls)]
[1, 2, 3]

ls2=[-1,-2,3]
[*set(ls2)]
[3, -1, -2]

ls2=[-2,-1,3]
[*set(ls2)]
[3, -2, -1]

Other objects, like the strings in your example, have very different hash values, so the behaviour is totally different:

hash('mango')
-7062263298897675226
gimix
  • 3,431
  • 2
  • 5
  • 21
  • so order is depends on the hash value of string and the hash value changes for every time. am i right? – Ramesh Aug 10 '22 at 06:42
  • no, the hash value for an object is always the same; the way it is stored in a hash table may change depending on the table size. Btw in your example if you simply assigned the list once and extracted the set items many times I would expect the result to always be the same. – gimix Aug 10 '22 at 06:48
  • The hash value for an *object* might (or should?) always be the same, but the hash value for a *value* isn't always, and that's probably what happened for them and it would be good to cover that. Try running [this](https://tio.run/##K6gsycjPM7YoKPr/v6AoM69EIyOxOENDPS0/PymxSF1T8/9/AA) a few times, for example (with the "play" button at the top). Do you always get the same output? – Kelly Bundy Aug 10 '22 at 09:45
  • Well, IDLE, Python 3.7.0: `x = hash('mango')`, `for i in range(100000): assert hash('mango') == x` doesn't raise any `AssertionError`. Why should the hash value change? – gimix Aug 10 '22 at 13:57
  • That's not what I said. If you don't want to do it online, then put a single `print(hash('mango'))` into a script file and run that script multiple times. Does it always print the same hash? – Kelly Bundy Aug 10 '22 at 15:58
  • That's interesting: even if I simply print the hash of a string in IDLE, then restart the shell and print it again the values are different. But at that point they are different processes, so it seems the hash calculation, for strings at least, uses some data taken from the process (I didn't check the source code however) – gimix Aug 10 '22 at 16:19
  • Sounds like you're finally able to reproduce what the question is about :-). If you do that restart with what the question does, you should see the list orders differ (99% of the time). – Kelly Bundy Aug 10 '22 at 19:17
  • But the OP didn't say they were running different processes every time :O – gimix Aug 10 '22 at 20:52
  • @gimix My bad i didnt mentioned it. but i give the output as example. – Ramesh Aug 11 '22 at 04:36
  • @KellyBundy so the hash value is randomized by default. and depending the hash value at that time,the order changes...am i right? – Ramesh Aug 11 '22 at 04:40
  • 1
    I found an interesting [old post](https://stackoverflow.com/questions/2070276/where-can-i-find-source-or-algorithm-of-pythons-hash-function). Look especially at the second answer, although some details _may_ have changed in newer versions – gimix Aug 11 '22 at 06:39
  • Long story short, "Note that you shouldn't rely on any specific behaviour from hash(). It may be different from version to version, and for some objects, even from run to run" as one comment in that post says – gimix Aug 11 '22 at 06:43