4

When reading such questions as Get unique values from a list in python you can see the remarks the order can be not preserved.

This is understandable. What is bugging me that it goes farther -- as I can see the execution is not even deterministic, i.e:

list(set(values_list))

I will get the unique values, the problem is with each run the outcome would be in different order. So this would mean set (constructor or enumerator) is not deterministic.

I wonder how it happened? I don't see a reason why on single thread execution (?) you would get non-deterministic behavior.

Sure, one can sort the outcome to enforce deterministic behavior, but you can get such idea once you observe you have non-deterministic code at hand.

Update:

The essence of my code -- the file is a pickle with array of 10 000 strings (with 3 unique values).

combined = pickle.load(open("labels.p", "rb"))
label_keys = list(set(combined))
print(label_keys)

At each run I get different order. Oh, I use Python 3.6.4.

astrowalker
  • 3,123
  • 3
  • 21
  • 40
  • 3
    Sets are *unordered*. So converting a set to a list is not guaranteed to have the same outcome every time. – mkrieger1 Aug 20 '18 at 09:35
  • 1
    to clarify, is this question: "why is the set constructor non-deterministic?" – joel Aug 20 '18 at 09:36
  • 3
    @mkrieger1 That doesn't mean it's *random* every time though. ***Unless it is: https://mail.python.org/pipermail/python-announce-list/2012-March/009394.html*** – deceze Aug 20 '18 at 09:37
  • 1
    Can you show an example? When I tried, `list(set(values))` always game the same order (tested with a list of random numbers, both with 3.6 and 2.7) – tobias_k Aug 20 '18 at 09:38
  • this question might be relevant https://stackoverflow.com/q/3812429/5986907 – joel Aug 20 '18 at 09:38
  • Or this: https://stackoverflow.com/questions/12165200/order-of-unordered-python-sets – mkrieger1 Aug 20 '18 at 09:39
  • @deceze why a dict question as the duplicate? this is about sets/lists – joel Aug 20 '18 at 09:41
  • @tobias_k, I updated the question. – astrowalker Aug 20 '18 at 09:42
  • 1
    @Joel Because the answer applies to dicts and sets equally. – deceze Aug 20 '18 at 09:42
  • Also see the Note at the end of [the `__hash__` docs](https://docs.python.org/3/reference/datamodel.html#object.__hash__), re: "values of `str`, `bytes` and `datetime` objects are “salted” with an unpredictable random value". – PM 2Ring Aug 20 '18 at 09:55
  • @tobias_k It's about hashing. Integers hash to themself, with no salting. – PM 2Ring Aug 20 '18 at 09:56
  • @tobias_k Really? Was that over separate runs? The hash of a string won't change during an execution run, that would be disastrous. But try running this a few times: `python3 -c "print({str(i) for i in range(9)})"` – PM 2Ring Aug 20 '18 at 10:24
  • @PM2Ring Ah, good point. No, it was indeed in the same interactive session. Silly me... – tobias_k Aug 20 '18 at 10:38

1 Answers1

-4

I don't see this problem so far:

>>> values_list = range(100) * 3
>>> a = list(set(values_list))
>>> b = list(set(values_list))
>>> a == b
True
>>> 
lenik
  • 23,228
  • 4
  • 34
  • 43