I want to choose a random element (uniform distribution) from a python set in O(1) time. Is this possible? I've seen it suggested that one first convert the set to a list then select the random element from the list however this will take O(n) time where n is the size of the set. If this is not possible, what is a reasonably fast alternative?
Asked
Active
Viewed 3,241 times
3
-
1You can use a dict and keep the keys and vals the same, then call `random.choice` on that. What's your use case--is a set required? As far as I know, I don't think there's a way to do better than O(n) but maybe I'm mistaken since intuitively it seems like it should be no problem. – ggorlen Sep 13 '20 at 23:21
-
I don't think you can randomly select from a set in O(1). Using a list seems like a reasonable approach. Unless your set is huge, it doesn't seem like it should be a problem. – khelwood Sep 13 '20 at 23:40
-
2[Time complexity](https://wiki.python.org/moin/TimeComplexity), from the Python wiki – jsmart Sep 14 '20 at 00:02
-
1If you really need a set data-structure, there are a number of third-party `orderedset` implementations available — see [PyPi](https://pypi.org), and it should be possible to remove a random element from one that supports indexing. Also see [Does Python have an ordered set?](https://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set) – martineau Sep 14 '20 at 00:12
-
Are you trying to have one data structure that exposes set-like operations as well as a `random()` one? – Alexandru Barbarosie Sep 14 '20 at 00:29
1 Answers
3
I think it is impossible to get a random element from a hashtable (how a set is implemented) because it does not support random access. random.choice
needs random access for this reason. You have a good alternative in set.pop
, but it doesn't appear to be uniform (see https://github.com/python/cpython/blob/master/Objects/setobject.c#L616).
As long as it's not performance critical in a tight loop, converting to a list should be fine. However, if it does matter a lot, maybe you can consider using a different data structure in the first place.

iz_
- 15,923
- 3
- 25
- 40