654

Suppose the following:

>>> s = set([1, 2, 3])

How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it - something I can only be sure of after an asynchronous call to another host.

Quick and dirty:

>>> elem = s.pop()
>>> s.add(elem)

But do you know of a better way? Ideally in constant time.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
  • 33
    Anyone know why python doesn't already have this function implemented? – hlin117 Jun 22 '15 at 07:50
  • 6
    What's the use case? Set doesn't have this ability for a reason. You supposed to iterate through it and make set related operations like `union` etc not taking elements from it. For example `next(iter({3,2,1}))` always returns `1` so if you thought that this would return random element - it wouldn't. So maybe you just using the wrong data structure? What's the use case? – user1685095 May 28 '17 at 13:59
  • 1
    Related: https://stackoverflow.com/questions/20625579/access-the-sole-element-of-a-set (I know, it's not the same question, but there are worthwhile alternatives and insights there.) – John Y Nov 13 '17 at 19:47
  • 1
    @hlin117 Because set is an [unordered collection](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset). Since no order is expected, it makes no sense to retrieve an element at given position - it is expected to be random. – Jeyekomon Dec 03 '19 at 11:16
  • 1
    @hlin117 So why does this then make no sense? It is called "drawing with replacement"... – Radio Controlled Nov 20 '20 at 07:36
  • I think the problem is that the values are not returned at random and so they do not want to create the impression that they are. Probably the real randomness would be too much overhead. But still for testing purposes it would be straightforward to have some `.get()` function or so... – Radio Controlled Nov 20 '20 at 08:04
  • 1
    b = (a-set()).pop() – Necho Dec 23 '20 at 19:39
  • @Necho: I _love_ this one!! Could also just do: `set(a).pop()`? – Daren Thomas Jan 14 '21 at 11:27
  • 1
    @DarenThomas I'm not an expert on Python. What I know is that using set(a) the time complexity is O(len(a)) so it is the same as copy(). In case of difference s-t the time complexity is O(len(t)) which in my example is zero, but maybe internally the operation (...).pop() is creating a copy, I don't know. That is something that an expert could tell us :) – Necho Feb 27 '21 at 03:20
  • 4
    One reasonable use case that I keep encountering is this: I am writing a test, and I get a set. I want to look at any value in it to build more data for the test. I don't care which one I get, and I don't really care if its the same or different each time. I just need _a_ value from the set. – Troy Daniels Dec 03 '21 at 18:33
  • @user1685095 If I want to iterate over a set while modifying it. – Alkenrinnstet Mar 19 '23 at 13:28
  • @user1685095 in my case I want a while loop that creates new names until it finds something not in the set. To make it run at least once through the loop I need to have an initial condition where my trial name is guaranteed to be in the set. I don't care where in the set. – Mark Ransom Jul 24 '23 at 17:09

15 Answers15

811

Two options that don't require copying the whole set:

for e in s:
    break
# e is now an element from s

Or...

e = next(iter(s))

But in general, sets don't support indexing or slicing.

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
Blair Conrad
  • 233,004
  • 25
  • 132
  • 111
  • 4
    This answers my question. Alas, I guess I will still use pop(), since iteration seems to sort the elements. I would prefer them in random order... – Daren Thomas Sep 12 '08 at 20:17
  • 21
    I don't think that the iter() is sorting the elements - when I create a set and pop() until it's empty, I get consistent (sorted, in my example) ordering, and it's the same as the iterator - pop() doesn't promise random order, just arbitrary, as in "I promise nothing". – Blair Conrad Sep 12 '08 at 20:20
  • You're absolutely right. I haven't kept up with changes since 1.5.2, and amn't overly familiar with iterators and related functions. (Stack Overflow has taught me much already.) Thanks for the suggestion. I made it my own. :-) – Blair Conrad Sep 13 '08 at 00:39
  • 8
    +1 `iter(s).next()` is not gross but great. Completely general to take arbitrary element from any iterable object. Your choice if you want to be careful if the collection is empty though. – u0b34a0f6ae Oct 23 '09 at 13:25
  • 18
    next(iter(s)) is also OK and I tend to think it reads better. Also, you can use a sentinel to handle the case when s is empty. E.g. next(iter(s), set()). – j-a Jul 22 '12 at 06:34
  • Is it safe to itrtate over elements even if you are removing someof them in the loop, such as for elem in s if elem>0: s.remove(elem) – highBandWidth Nov 29 '15 at 05:27
  • @highBandWidth: I think python will raise an error if you do that. You're not allowed to iterate through a set that you're removing elements from. – hlin117 Mar 27 '16 at 20:44
  • 18
    `next(iter(your_list or []), None)` to handle None sets and empty sets – MrE Jul 31 '18 at 14:22
  • in cpython pop and iter are in the same order – Erik Aronesty May 04 '22 at 15:41
204

Least code would be:

>>> s = set([1, 2, 3])
>>> list(s)[0]
1

Obviously this would create a new list which contains each member of the set, so not great if your set is very large.

John
  • 14,944
  • 12
  • 57
  • 57
  • 31
    @augurar: Because it gets the job done in a relatively simple manner. And sometimes that's all that matters in a quick script. – tonysdg Feb 21 '17 at 01:58
  • 4
    @augurar I think people voted on this answer because `set` is not made for indexing and slicing primarily; and this user just shifted the coder to use the suitable datatype for such work i.e. `list`. – Vicrobot Jul 14 '18 at 15:26
  • 9
    @Vicrobot Yeah but it does so by copying the entire collection and turning an O(1) operation into an O(n) operation. This is a terrible solution that no one should ever use. – augurar Jul 16 '18 at 22:01
  • 24
    Also if you're just aiming for "least code" (which is dumb), then `min(s)` uses even fewer characters while being just as terrible and inefficient as this. – augurar Jul 16 '18 at 22:06
  • 16
    +1 for the code golf winner, which I have a practical counterexample for being "terrible and inefficient": `min(s)` is slightly faster than `next(iter(s))` for sets of size 1, and I came to this answer specifically looking to special-case extracting the only element from sets of size 1. – lehiester Jan 18 '19 at 19:49
  • 3
    @augurar this solution also allows me to retrieve any index (eg `list(s)[2]`) - in my "small test world" this is quite handy! – Qaswed May 08 '19 at 12:28
  • 1
    @CecilCurryand @augurar It's not O(n) if you have a fixed number of elements. Chances are you Googled this question because you've got a 1-element set and want its element. I'm not gonna remember `next(iter(s))`; I know `list` copies it and will search for the iterator solution if that's a concern. – sudo Oct 02 '19 at 00:32
  • 1
    Using [Iterable Unpacking](https://www.python.org/dev/peps/pep-3132/) `[*s][0]` is now the shortest solution. – grahamcracker1234 Dec 26 '20 at 23:34
  • "least code" is not dumb. Simpler, readable code is good practice and it depends on your design constraints as to if how much this affects your efficiency would alter your choice to use this method. – openCivilisation May 19 '21 at 01:43
  • 4
    If set is guaranteed to be one element, you can unpack into a tuple. `elem, = s` – pdpAxis Jun 03 '21 at 14:46
  • @openCivilisation the solution with the least amount of characters is anything but simple and readable – tbrugere Aug 24 '23 at 00:11
190

I wondered how the functions will perform for different sets, so I did a benchmark:

from random import sample

def ForLoop(s):
    for e in s:
        break
    return e

def IterNext(s):
    return next(iter(s))

def ListIndex(s):
    return list(s)[0]

def PopAdd(s):
    e = s.pop()
    s.add(e)
    return e

def RandomSample(s):
    return sample(s, 1)

def SetUnpacking(s):
    e, *_ = s
    return e

from simple_benchmark import benchmark

b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
              {2**i: set(range(2**i)) for i in range(1, 20)},
              argument_name='set size',
              function_aliases={first: 'First'})

b.plot()

enter image description here

This plot clearly shows that some approaches (RandomSample, SetUnpacking and ListIndex) depend on the size of the set and should be avoided in the general case (at least if performance might be important). As already shown by the other answers the fastest way is ForLoop.

However as long as one of the constant time approaches is used the performance difference will be negligible.


iteration_utilities (Disclaimer: I'm the author) contains a convenience function for this use-case: first:

>>> from iteration_utilities import first
>>> first({1,2,3,4})
1

I also included it in the benchmark above. It can compete with the other two "fast" solutions but the difference isn't much either way.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • 3
    This is a great answer. Thanks for putting in the time to make it empirical. – Eric McLachlan Mar 21 '21 at 06:50
  • graph gives more attention to answer – DikShU Jul 03 '21 at 12:40
  • 1
    I have a short question, why do you use break in the ForLoop instead of using `return e` directly? The function should "break" the moment the return is executed. – Andreas Sep 21 '21 at 14:55
  • @Andreas That's a good and valid point. Thanks for bringing it up. But for the "why": I wanted to compare the runtime from the other answers so I simply copied the approach from those. In this case the answer had the `break` (ref https://stackoverflow.com/a/59841)... not a good answer but I simply didn't want to change their code too much. – MSeifert Sep 23 '21 at 09:36
  • @MSeifert, I see. I was a bit confused because I thought I missed something fundamentaly. Thank you for the clarification! – Andreas Sep 23 '21 at 13:24
  • I decided to use your package - I would love to have it throw an error if the set is empty. Just a thought for the future – Adventure-Knorrig Dec 14 '21 at 16:15
  • @DanielJerrehian - `iteration_utilities.first` should throw an `IndexError` if the set is empty and no `default` argument was provided. – MSeifert Dec 14 '21 at 17:59
  • @MSeifert - Sorry, typo - I meant "not" throw an error if it is empty – Adventure-Knorrig Dec 14 '21 at 18:01
  • 1
    @DanielJerrehian In that case you can provide a default value `first(set(), default=None)` for example :) – MSeifert Dec 14 '21 at 18:01
  • I would love to do some kind of 'favourite' on this post but I can't see a way. I've upvoted, but I suspect, but I guess that won't do it. – TheFaithfulLearner Jun 26 '22 at 21:38
72

tl;dr

for first_item in muh_set: break remains the optimal approach in Python 3.x. Curse you, Guido.

y u do this

Welcome to yet another set of Python 3.x timings, extrapolated from wr.'s excellent Python 2.x-specific response. Unlike AChampion's equally helpful Python 3.x-specific response, the timings below also time outlier solutions suggested above – including:

Code Snippets for Great Joy

Turn on, tune in, time it:

from timeit import Timer

stats = [
    "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
    "for i in range(1000): next(iter(s))",
    "for i in range(1000): s.add(s.pop())",
    "for i in range(1000): list(s)[0]",
    "for i in range(1000): random.sample(s, 1)",
]

for stat in stats:
    t = Timer(stat, setup="import random\ns=set(range(100))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Quickly Obsoleted Timeless Timings

Behold! Ordered by fastest to slowest snippets:

$ ./test_get.py
Time for for i in range(1000): 
    for x in s: 
        break:   0.249871
Time for for i in range(1000): next(iter(s)):    0.526266
Time for for i in range(1000): s.add(s.pop()):   0.658832
Time for for i in range(1000): list(s)[0]:   4.117106
Time for for i in range(1000): random.sample(s, 1):  21.851104

Faceplants for the Whole Family

Unsurprisingly, manual iteration remains at least twice as fast as the next fastest solution. Although the gap has decreased from the Bad Old Python 2.x days (in which manual iteration was at least four times as fast), it disappoints the PEP 20 zealot in me that the most verbose solution is the best. At least converting a set into a list just to extract the first element of the set is as horrible as expected. Thank Guido, may his light continue to guide us.

Surprisingly, the RNG-based solution is absolutely horrible. List conversion is bad, but random really takes the awful-sauce cake. So much for the Random Number God.

I just wish the amorphous They would PEP up a set.get_first() method for us already. If you're reading this, They: "Please. Do something."

Cecil Curry
  • 9,789
  • 5
  • 38
  • 52
  • 3
    I think complaining that that `next(iter(s))` is twice slower than `for x in s: break` in `CPython` is kind of strange. I mean that is `CPython`. It will be about 50-100 times (or something like that) slower than C or Haskell doing the same thing (for the most of the time, especially so in iteration, no tail call elimination and no optimizations whatsoever.). Loosing some microseconds doesn't make a real difference. Don't you think? And there's also PyPy – user1685095 May 28 '17 at 14:06
  • 6
    Since sets are not ordered, a `set.get_first()` could be misleading. But I would like a `set.get_any()`, which returns any element from the set, even if that element is always the same. – Eduardo Mar 19 '22 at 02:06
40

To provide some timing figures behind the different approaches, consider the following code. The get() is my custom addition to Python's setobject.c, being just a pop() without removing the element.

from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

The output is:

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break:   0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

This means that the for/break solution is the fastest (sometimes faster than the custom get() solution).

wr.
  • 2,841
  • 1
  • 23
  • 27
  • Does anyone have an idea why iter(s).next() is so much slower than the other possibilities, even slower than s.add(s.pop())? For me it feels like very bad design of iter() and next() if the timings look like that. – peschü Jun 24 '15 at 10:35
  • Well for one that line creates a new iter object each iteration. – Ryan Sep 13 '15 at 19:37
  • 5
    @Ryan: Isn't an iterator object created implicitly for `for x in s` as well? ["An iterator is created for the result of the `expression_list`."](https://docs.python.org/2/reference/compound_stmts.html#the-for-statement) – musiphil Oct 05 '15 at 17:25
  • 2
    @musiphil That is true; originally I missed the "break" one being at 0.14, that is really counter-intuitive. I want to do a deep dive into this when I have time. – Ryan Oct 05 '15 at 17:36
  • 1
    I know this is old, but when adding `s.remove()` into the mix the `iter` examples both `for` and `iter` go catastrophically bad. – AChampion Jan 24 '16 at 08:27
29

Since you want a random element, this will also work:

>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]

The documentation doesn't seem to mention performance of random.sample. From a really quick empirical test with a huge list and a huge set, it seems to be constant time for a list but not for the set. Also, iteration over a set isn't random; the order is undefined but predictable:

>>> list(set(range(10))) == range(10)
True 

If randomness is important and you need a bunch of elements in constant time (large sets), I'd use random.sample and convert to a list first:

>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time
dF.
  • 74,139
  • 30
  • 130
  • 136
  • 14
    If you just want one element, random.choice is more sensible. – Gregg Lind Nov 03 '08 at 17:28
  • list(s).pop() will do if you don't care which element to take. – Evgeny May 12 '14 at 00:11
  • 11
    @Gregg: You can't use `choice()`, because Python [will try to index your set](https://hg.python.org/cpython/file/754c630c98a3/Lib/random.py#l250) and that doesn't work. – Kevin Jan 20 '15 at 21:13
  • 5
    While clever, this is actually **the slowest solution yet suggested by an order of magnitude.** Yes, it's _that_ slow. Even converting the set into a list just to extract the first element of that list is faster. For the non-believers amongst us (_...hi!_), see these [fabulous timings](https://stackoverflow.com/a/40054478/2809027). – Cecil Curry Oct 15 '16 at 03:03
22

Yet another way in Python 3:

next(iter(s))

or

s.__iter__().__next__()
dzang
  • 2,160
  • 2
  • 12
  • 21
15

Seemingly the most compact (6 symbols) though very slow way to get a set element (made possible by PEP 3132):

e,*_=s

With Python 3.5+ you can also use this 7-symbol expression (thanks to PEP 448):

[*s][0]

Both options are roughly 1000 times slower on my machine than the for-loop method.

skovorodkin
  • 9,394
  • 1
  • 39
  • 30
  • 7
    The for loop method (or more accurately the iterator method) has O(1) time complexity, while these methods are O(N). They are *concise* though. :) – ForeverWintr Sep 12 '19 at 05:27
6

Following @wr. post, I get similar results (for Python3.5)

from timeit import *

stats = ["for i in range(1000): next(iter(s))",
         "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in range(1000): s.add(s.pop())"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Output:

Time for for i in range(1000): next(iter(s)):    0.205888
Time for for i in range(1000): 
    for x in s: 
        break:                                   0.083397
Time for for i in range(1000): s.add(s.pop()):   0.226570

However, when changing the underlying set (e.g. call to remove()) things go badly for the iterable examples (for, iter):

from timeit import *

stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
         "while s:\n\tfor x in s: break\n\ts.remove(x)",
         "while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Results in:

Time for while s:
    a = next(iter(s))
    s.remove(a):             2.938494
Time for while s:
    for x in s: break
    s.remove(x):             2.728367
Time for while s:
    x=s.pop()
    s.add(x)
    s.remove(x):             0.030272
AChampion
  • 29,683
  • 4
  • 59
  • 75
6

I use a utility function I wrote. Its name is somewhat misleading because it kind of implies it might be a random item or something like that.

def anyitem(iterable):
    try:
        return iter(iterable).next()
    except StopIteration:
        return None
Nick
  • 21,555
  • 18
  • 47
  • 50
3

What I usually do for small collections is to create kind of parser/converter method like this

def convertSetToList(setName):
return list(setName)

Then I can use the new list and access by index number

userFields = convertSetToList(user)
name = request.json[userFields[0]]

As a list you will have all the other methods that you may need to work with

Josué Carvajal
  • 114
  • 1
  • 9
2

You can unpack the values to access the elements:

s = set([1, 2, 3])

v1, v2, v3 = s

print(v1,v2,v3)
#1 2 3
seralouk
  • 30,938
  • 9
  • 118
  • 133
  • I suppose you could unpack to `v1, _*`. Without a wildcard, you'd need to exactly match the number of elements. But as noted in the previous answer https://stackoverflow.com/a/45803038/15416, this is slow – MSalters Aug 04 '21 at 14:27
0

I f you want just the first element try this: b = (a-set()).pop()

Necho
  • 484
  • 3
  • 5
-3

How about s.copy().pop()? I haven't timed it, but it should work and it's simple. It works best for small sets however, as it copies the whole set.

Solomon Ucko
  • 5,724
  • 3
  • 24
  • 45
-8

Another option is to use a dictionary with values you don't care about. E.g.,


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
...

You can treat the keys as a set except that they're just an array:


keys = poor_man_set.keys()
print "Some key = %s" % keys[0]

A side effect of this choice is that your code will be backwards compatible with older, pre-set versions of Python. It's maybe not the best answer but it's another option.

Edit: You can even do something like this to hide the fact that you used a dict instead of an array or set:


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
poor_man_set = poor_man_set.keys()
Pat Notz
  • 208,672
  • 30
  • 90
  • 92
  • 3
    This doesn't work the way you hope it will. In python 2 keys() is an O(n) operation, so you're no longer constant time, but at least keys[0] will return the value you expect. In python 3 keys() is an O(1) operations, so yay! However, it no longer returns a list object, it returns a set-like object that can't be indexed, so keys[0] would throw TypeError. http://stackoverflow.com/questions/39219065/what-is-the-time-complexity-of-dict-keys-in-python – sage88 Apr 07 '17 at 19:55