27

I understand that the elements of a python set are not ordered. Calling the pop method returns an arbitrary element; I'm fine with that.

What I'm wondering is whether or not pop will ALWAYS return the same element when the set has the same history. Within one version of python of course, I don't mind if different versions/implementations of python do their own thing. In particular, I'm asking about python 2.7. It's a matter of implementation more than of api in this case.

I'm using sets a lot in a procedural dungeon generator for a game, and I'd like the outcome to be deterministic for a given seed.

martineau
  • 119,623
  • 25
  • 170
  • 301
Niriel
  • 2,605
  • 3
  • 29
  • 36
  • 1
    Related: http://stackoverflow.com/questions/3949310/how-is-cpythons-set-implemented and http://svn.python.org/view/python/trunk/Objects/setobject.c?view=markup – ChristopheD May 03 '12 at 13:10
  • 2
    Why not test it / look at the source? – Marcin May 03 '12 at 13:12
  • @delnan " In particular, I'm asking about python 2.7. It's a matter of implementation more than of api in this case." So, no need to test several versions, or future versions as you suggest. You seem to have imagined a requirement for portability and eternity. – Marcin May 03 '12 at 13:34
  • 1
    @Marcin Ah yes, sorry. I missed that. It' kind of a knee-jerk reaction, many comments like this actually have the flaws I wrongly accused yours of ("Is ... undefined behavior in C89?" -- "Just try if your version of your compiler, on your machine, with the flags you happen to choose, makes it work"). –  May 03 '12 at 13:35
  • 2
    If you have the _exact_ same set of objects, and you can guarantee that the same hash function is used, then yes, `set.pop()`, as well as `list(set())` can be deterministic. – Joel Cornett May 03 '12 at 13:38
  • The bounty that Karl put on this question is missing the point. He wants a canonical answer. The canonical answer has already been given: assume non-determinacy when using `set.pop()`. If you want to know a deterministic way to remove elements from a set, then ask that as another question. This one is already answered. – Mike Williamson Dec 06 '22 at 17:22

5 Answers5

35

The answer in general is no. The python source that @Christophe and @Marcin (un)helpfully point to shows that elements are popped in the order they appear in the hash table. So, pop order (and presumably iteration order) is deterministic, but only for fixed hash values. That's the case for numbers but not for strings, according to the Note in the documentation of __hash__, which incidentally also touches on your question directly:

Note by default the hash() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

[ ... ]

Changing hash values affects the iteration order of dicts, sets and other mappings. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).

Edit: As @Marcin points out, the link I quoted does not apply to Python 2. Hash randomization became the default with Python 3.3. Python 2.7 does not have intentionally non-deterministic string hashing by default.

In general, this is a problem for any object whose hash is not a repeatable function of its value (e.g., if the hash is based on memory address). But conversely, if you define your own __hash__ method for the objects in your sets, you can expect that they will be returned in a reproducible order. (Provided the set's history and the platform are kept fixed).

alexis
  • 48,685
  • 16
  • 101
  • 161
  • 1
    You are referring to the documentation for the dev version of python. This question is about python 2.7, and the text you quote does not appear in the corresponding document for that version: http://docs.python.org/reference/datamodel.html#object.__hash__ – Marcin May 03 '12 at 14:22
6

Internally I think the situation is similar to dict. The order is determined by an hash algorithm, which in some situations will yield the same results. But you should not depend on that, since once the number of elements gets large, the set will encounter collisions (that is it's internal hashing), which eventually lead to a different ordering.

In short: No, set.pop() is not deterministic. Don't assume any order, since the API explicitly states, that

a set object is an unordered collection

miku
  • 181,842
  • 47
  • 306
  • 310
4

The documentation does not specify that it must be deterministic, therefore you should assume that it isn't.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 4
    Given that the question appears to be about a specific version, there is no need to assume anything - the source can be checked, and behaviour tested. – Marcin May 03 '12 at 13:35
2

If you want to force determinism, you could try something like

value = min(my_set)
my_set.remove(value)
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • 2
    note that this is only deterministic when min() is unambiguous. It is possible to have a weird set with distinct values where there are two or more which are all less than all the others (and both are less than the other). not common in the wild, but possible. – ch3ka May 03 '12 at 13:43
  • 3
    A better example would be values that simply can't be ordered (i.e. which can only be compared for equality). A definition of `__lt__` which allows `x < y` and `y < x` simultaneously, while legal to write, is frankly broken. – Karl Knechtel May 03 '12 at 15:54
  • 3
    when the set cannot be ordered (e.g., a set of complex numbers), your solution will fail anyway with a TypeError. But consider `class epsilon(float): def __lt__(self, other): return True if 0 < other` – ch3ka May 03 '12 at 16:28
-1

If you really are targeting one particular version of python, then you can look at the source, and test its behaviour (but test well - consider load factors and the like).

If you want portability, or you find set doesn't perform as required, use an ordereddict (here's one: http://code.activestate.com/recipes/576693/ ; there are loads of others, so find one you like the look of), and adapt it as a set.

Update: here's an ordered set: http://packages.python.org/Brownie/api/datastructures.html#brownie.datastructures.OrderedSet

Marcin
  • 48,559
  • 18
  • 128
  • 201
  • Ordereddict is in stdlib in 2.7 and 3.1+ (http://docs.python.org/library/collections.html#collections.OrderedDict, http://docs.python.org/dev/library/collections.html#collections.OrderedDict) – miku May 03 '12 at 13:47
  • @miku Given that it's implemented in C, that cannot be adapted portably, as specified in the very same sentence you are responding to. – Marcin May 03 '12 at 14:19