1

I have two questions about sets.

1. So as I read sets are unordered, but when I started experimenting with them I found out that actually there is some kind of ordering thing.

As you can see, there is nothing special in this set:

>>> v_set ={88,11,1,33,21,3,7,55,37,8}
>>> v_set
{33, 1, 3, 37, 7, 8, 11, 21, 55, 88}

But this one is different:

>>> g_set={7,5,11,1,4,13,55,12,2,3,6,20,9,10}
>>> g_set
{1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 20, 55}

I guess, it's because this time I wrote down more closer numbers, and it started to make sense to set those numbers ascending sequence...?

2. And the second question is about pop(). I read that there is no way to control which value gets removed with pop() method, it is completely arbitrary. Bet when I use pop() method it always (I never saw differently) takes the first item from the left side in sets.

As you can see:

>>> v_set
{33, 1, 3, 37, 7, 8, 11, 21, 55, 88}
>>> v_set.pop()
33
>>> v_set.pop()
1
>>> v_set.pop()
3
>>> v_set.pop()
37
>>> v_set.pop()
7
>>> v_set.pop()
8
>>> v_set.pop()
11
>>> v_set.pop()
21
>>> v_set.pop()
55

So is it really completely arbitrary?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Emils P.
  • 35
  • 5
  • 1
    Please keep your posts to just *one question*; your first issue is a duplicate of [Why is the order in Python dictionaries arbitrary?](http://stackoverflow.com/q/15479928) (sets are just dictionaries with just keys and no values). – Martijn Pieters Sep 29 '14 at 11:52
  • The order of `.pop()` is just as 'arbitrary' as the iteration order of a set; it makes little sense for Python to 'randomize' this. – Martijn Pieters Sep 29 '14 at 12:02
  • @MartijnPieters "_sets are just dictionaries with just keys and no values_" no they aren't, and they never were. – wim Nov 24 '22 at 05:22
  • @wim: yes, they are, and always have been. What makes you think they were not? Sets are hash tables, and they were implemented initially in pure Python as a dictionaries with `None` values, because that's how we created sets before we had sets. – Martijn Pieters Nov 25 '22 at 11:33
  • @MartijnPieters Nope. Here was [dictobject.c](https://github.com/python/cpython/blob/v3.4.1/Objects/dictobject.c) at the time (2014) and here's [setobject.c](https://github.com/python/cpython/blob/v3.4.1/Objects/setobject.c). They may both be implemented with hash tables, but there's no code re-use and it wouldn't be correct to say that sets are dictionaries. The implementation is different, for example [sets use a combination of linear probing and open addressing, to improve cache locality](https://github.com/python/cpython/blob/v3.4.1/Objects/setobject.c#L20-L22) - dicts don't do that. – wim Nov 26 '22 at 02:51
  • @wim: I didn't say they shared code, they share the same concepts. It really helps understanding when you realise they are both hash tables. – Martijn Pieters Nov 26 '22 at 15:05

3 Answers3

0

Yes, the ordering is arbitrary, by definition. Even if items where stored in sorted order, it would still be arbitrary. "Arbitrary" means that Python doesn't promise to order data in any particular way. Because memory is linear it has to use some ordering, but you should never rely on that ordering because it may be changed without notice. (In fact, in the latest versions of Python, the order of items in a set is partially randomized.)

Your second example shows that the order of printing is the same as the order of popping. This makes sense: repr walks the items in the order they are stored in memory, and pop apparently returns the first item according to that same order. Again, you cannot rely on this: it's an implementation detail and if the Python developers figure out a faster way to do pop, they're free to break any code that relies on set ordering.

If you want to know how this works, read up on hash tables.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
0

It is not completely arbitrary. But it does not matter.

We call the set unordered because you, as a user or client of that code, must not depend on a specific order. However, based on details of the implementation of the set, it is likely that there is some order.

The same with regards to pop(). It is very likely that the specific implementation of python you use has logic that will lead to clearly deterministic results. However, your code might be used with a python interpreter that uses a different implementation. A random element is the only guarantee you get from the implementation.

To summarize, the documentation gives you a set of guarantees that any compliant python implementation will follow. Additional effects that you observe are implementation details and might change at any time.

Wilbert
  • 7,251
  • 6
  • 51
  • 91
0

Note that the order of the elements depends (also) on the order of the insertions. You can easily see this when there are collisions:

In [4]: class Bad:
   ...:     def __init__(self, val, hash_val):
   ...:         self.val = val
   ...:         self.hash_val = hash_val
   ...:     def __str__(self):
   ...:         return 'Bad({0.val}, {0.hash_val})'.format(self)
   ...:     __repr__ = __str__
   ...:     def __eq__(self, other):
   ...:         return self.val == other.val
   ...:     def __hash__(self):
   ...:         return self.hash_val

In [5]: b1 = Bad(1, 1)
   ...: b2 = Bad(2, 1)
   ...: b3 = Bad(3, 2)

In [6]: {b1, b2, b3}
Out[6]: {Bad(2, 1), Bad(3, 2), Bad(1, 1)}

In [7]: {b2, b1, b3}
Out[7]: {Bad(1, 1), Bad(3, 2), Bad(2, 1)}

As you can see in Out[6] the first element is Bad(2, 1) and the last is Bad(1, 1) while in Out[7] the first is Bad(1, 1) and the last is Bad(2, 1).

If there were no collisions:

In [8]: b1 = Bad(1, 1)
   ...: b2 = Bad(2, 2)
   ...: b3 = Bad(3, 3)

In [9]: {b1, b2, b3}
Out[9]: {Bad(1, 1), Bad(2, 2), Bad(3, 3)}

In [10]: {b2, b1, b3}
Out[10]: {Bad(1, 1), Bad(2, 2), Bad(3, 3)}

note how the order didn't change. (Well, the hash is used modulus some n so there can be collisions even if the hashes are different, depending on the size of the underlying table).

In other words the values aren't enough to determine the order of the elements of a set, even if you know how they are implemented. You must also know the order of the insertions.

In general sets do have a well defined order inside a single interpreter run (due to randominzation in python3.3+), however which order is used is depends on the insertions performed (both the value and the order in which they are done), and is arbitrary, i.e. in python3.5 they can change the order without notice, so you cannot rely on it.

They could truly randomize the outputs but this would simply add overhead for no benefit.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231