2

I recently watched Raymond Hettingers talk about python dictionaries (and by extension sets...) and he mentioned that integers hash to themselves and that adding integers to a dict (or set...) will insert them in order and as long as you don't delete items the order will be preserved in python 3.6 (and probably above?). In the answer to this question it is stated that dictionaries preserve order of insertion, but for sets it seams like integers are ordered according to their value.

Now: according to the time-complexity section of python.org and in more detail here it is stated that the average time-complexity of adding an element to a set is O(1). This means if you have an unsorted list of integers it should be possible to sort them by simply doing:

sorted_list = list(set(unsorted_list))

This seams to be the case as far as I have tested it (did it a few 1000 times with random sequences).

My question is now: Does this mean it is possible to sort integers in python in O(n) time?

To me it would seam so, as it takes O(n) to build the set and O(n) to convert the set back to a list or am I missing something here?

colidyre
  • 4,170
  • 12
  • 37
  • 53
Chris
  • 710
  • 7
  • 15
  • Even if this does work, you should absolutely not rely on that behaviour. It's mostly an internal implementation detail you're relying on, and in other Python flavours or versions it may change. So: [mu](https://en.wikipedia.org/wiki/Mu_(negative)#"Unasking"_the_question). – deceze Apr 08 '20 at 10:36
  • Where does Python guarantee the behaviour of `hash()` with regards to integers and where do sets guarantee their ordering behaviour? I cannot find any such explicit guarantee. Also see: https://stackoverflow.com/q/6550660/476 – deceze Apr 08 '20 at 11:23
  • @deceze I'd say [this](https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types) counts as some guarantee for hasing integers. – Kelly Bundy Apr 08 '20 at 11:48
  • Note that on PyPy 3.6.9 (7.3.0), ``list({1, 8}) == list({8, 1})`` does not hold true. – MisterMiyagi Apr 08 '20 at 11:56
  • @MisterMiyagi Do you know why? Does PyPy keep sets in insertion order? – Kelly Bundy Apr 08 '20 at 12:01
  • 1
    @HeapOverflow It appears that ``set`` uses the same mechanism that makes ``dict`` ordered (which PyPy introduced in 2.7). ["Dictionaries and sets are ordered on PyPy. On CPython < 3.6 they are not; on CPython >= 3.6 dictionaries (but not sets) are ordered."](https://doc.pypy.org/en/latest/cpython_differences.html) – MisterMiyagi Apr 08 '20 at 12:08
  • 1
    @MisterMiyagi I [tried it](https://ideone.com/rtrOr7) (on PyPy 2.7) with a million random floats and it kept the order. So I guess insertion order it is. Presumably because for dicts it's guaranteed and they use the same mechanism for sets (unlike CPython). – Kelly Bundy Apr 08 '20 at 12:09
  • Raymond Hettinger is talking about integers that are small enough to fit into a machine integer. See https://docs.python.org/3/library/functions.html#hash & the linked docs for `__hash__`. Also note that -1 doesn't hash to itself. – PM 2Ring Apr 08 '20 at 20:34
  • FWIW, you can have two sets containing identical *strings* but when you convert them to lists they can compare unequal. Or they may not. ;) Here's some code I wrote that demonstrates this: https://stackoverflow.com/a/51578541/4014959 – PM 2Ring Apr 08 '20 at 20:39

3 Answers3

6

No, not in general. You must've tried it with special cases, for example where the unsorted input list contains all numbers from 0 to n, each once.

Here's a simple case that fails:

>>> list(set([8, 1]))
[8, 1]

Done with CPython 3.8.1 32-bit.

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
  • 1
    do you know if this is because of truncation of the hash when the set is small? – Chris Apr 08 '20 at 11:49
  • @Chris Yes, sets start at internal size 8, so hashes are taken modulo 8, so the number 8 gets hashed to 8 which then modulo 8 becomes 0. And thus ends up at index 0. While 1 ends up at index 1. [More here](https://stackoverflow.com/a/60879313/12671057). (Note: Some of this might only be true for current CPython, I don't think the `set` internals are guaranteed.) – Kelly Bundy Apr 08 '20 at 11:55
5

No. Sets do not allow sorting integers. While the hashes of integers are well-defined, the order of sets is arbitrary.

The order of sets may vary by implementation, process and instance.

# CPython 3.7.4
>>> list({1, 8})
[8, 1]
>>> list({8, 1})
[8, 1]
# PyPy 3.6.9 (PyPy 7.3.0)
>>> list({1, 8})
[1, 8]
>>> list({8, 1})
[8, 1]
# CPython 2.7.10
>>> list({1, 8})
[8, 1]
>>> list({8, 1})
[8, 1]
# Jython 2.7.1 (java13.0.2)
>>> list({1, 8})
[1, 8]
>>> list({8, 1})
[1, 8]

The order of sets may also depend on the history of an instance.

# CPython 3.7.4
>>> a = {1, 3, 4, 8}
>>> list(a)
[8, 1, 3, 4]
>>> a.add(2)
>>> list(a)
[1, 2, 3, 4, 8]
>>> a.discard(2)
>>> list(a)
[1, 3, 4, 8]
MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
4

Generally, O(n) sorting is possible for integers, strings and many other types of data. O(n log n) is the best you can do with sorting algorithms that only use comparisons (>, <, ==) to determine the order of items, but for many types, you're not limited to such algorithms. In particular, see Radix sort for sorting integers.

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
Błotosmętek
  • 12,717
  • 19
  • 29