why did the implicit ordering change
Because set
is implemented as hashtable in CPython. So:
- The slot where an item is stored in the
set
depends on the (truncated) hash of the item (and to some extend the insertion order because the slot may be already occupied).
- The hash of the item depends on the
__hash__
method of that item.
And if you iterate over the set
you iterate over the hashtable (excluding missing entries) slot-by-slot. So by changing the hashes you may change the order.
And in your case the hashes are different because you changed the __hash__
method implementation, so it's to be expected that the order is different:
>>> [hash(tab) for tab in Tab.set(9)] # first code
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [hash(tab) for tab in Tab.set(9)] # second code
[3713072971709512581, 3713088127104978631, 3713071889183430056, 3713087044578896106, 3713083796991988331, 3713082714465905806, 3713085962048483481, 3713084879522400956, 3713081631935493181]
what might I do to preserve the implicit ordering so my set appears to be in order, just as before
If you want them to be ordered my suggestion would be to not use a set
- use an ordered collection, just to name one example: list
. There are also ways to efficiently remove duplicates from a list and preserving the order.
But if you want to keep the set
and want them to be sorted by the label
property you could also use sorted
:
sorted(tab.set(9), key=lambda t: t.label)
>>> [str(t) for t in sorted(Tab.set(9), key=lambda t: t.label)]
['1', '2', '3', '4', '5', '6', '7', '8', '9']
Inspecting the CPython 3.7.4 hashtable using Cython
Note: This is inspecting implementation details which may change across versions. The Cython code may not even work for different CPython versions. Don't take them literally and never ever depend on them.
In case you're interested how the actual implementation details of CPython look like, you can use this Cython helper for Jupyter I've written in this answer to "Updating a set while iterating over its elements":
%load_ext Cython
%%cython
from cpython cimport PyObject, PyTypeObject
cimport cython
cdef extern from "Python.h":
ctypedef Py_ssize_t Py_hash_t
struct setentry:
PyObject *key
Py_hash_t hash
ctypedef struct PySetObject:
Py_ssize_t ob_refcnt
PyTypeObject *ob_type
Py_ssize_t fill
Py_ssize_t used
Py_ssize_t mask
setentry *table
Py_hash_t hash
Py_ssize_t finger
setentry smalltable[8]
PyObject *weakreflist
cpdef print_set_table(set inp):
cdef PySetObject* innerset = <PySetObject *>inp
for idx in range(innerset.mask+1):
if (innerset.table[idx].key == NULL):
print(idx, '<EMPTY>')
else:
print(idx, innerset.table[idx].hash, <object>innerset.table[idx].key)
This will print the internal hashtable, each line contains the slot number, the hash and the stored item.
In the first case:
>>> print_set_table(Tab.set(9))
0 <EMPTY>
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 <EMPTY>
11 <EMPTY>
[...]
30 <EMPTY>
31 <EMPTY>
Second case:
>>> print_set_table(Tab.set(9))
0 <EMPTY>
[...]
4 <EMPTY>
5 3713072971709512581 9
6 <EMPTY>
7 3713088127104978631 7
8 3713071889183430056 8
9 <EMPTY>
10 3713087044578896106 6
11 3713083796991988331 3
12 <EMPTY>
13 <EMPTY>
14 3713082714465905806 2
15 <EMPTY>
[...]
24 <EMPTY>
25 3713085962048483481 5
26 <EMPTY>
27 <EMPTY>
28 3713084879522400956 4
29 3713081631935493181 1
30 <EMPTY>
31 <EMPTY>