How to test the membership of multiple values in a list

Question

I want to test if two or more values have membership on a list, but I'm getting an unexpected result:

>>> 'a','b' in ['b', 'a', 'foo', 'bar']
('a', True)

So, Can Python test the membership of multiple values at once in a list? What does that result mean?

_{See also: How to find list intersection?. Checking whether any of the specified values is in the list, is equivalent to checking if the intersection is non-empty. Checking whether all the values are in the list, is equivalent to checking if they are a subset.}

senderle · Accepted Answer · 2017-06-26T20:46:26.383

This does what you want, and will work in nearly all cases:

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])
True

The expression 'a','b' in ['b', 'a', 'foo', 'bar'] doesn't work as expected because Python interprets it as a tuple:

>>> 'a', 'b'
('a', 'b')
>>> 'a', 5 + 2
('a', 7)
>>> 'a', 'x' in 'xerxes'
('a', True)

Other Options

There are other ways to execute this test, but they won't work for as many different kinds of inputs. As Kabie points out, you can solve this problem using sets...

>>> set(['a', 'b']).issubset(set(['a', 'b', 'foo', 'bar']))
True
>>> {'a', 'b'} <= {'a', 'b', 'foo', 'bar'}
True

...sometimes:

>>> {'a', ['b']} <= {'a', ['b'], 'foo', 'bar'}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Sets can only be created with hashable elements. But the generator expression all(x in container for x in items) can handle almost any container type. The only requirement is that container be re-iterable (i.e. not a generator). items can be any iterable at all.

>>> container = [['b'], 'a', 'foo', 'bar']
>>> items = (i for i in ('a', ['b']))
>>> all(x in [['b'], 'a', 'foo', 'bar'] for x in items)
True

Speed Tests

In many cases, the subset test will be faster than all, but the difference isn't shocking -- except when the question is irrelevant because sets aren't an option. Converting lists to sets just for the purpose of a test like this won't always be worth the trouble. And converting generators to sets can sometimes be incredibly wasteful, slowing programs down by many orders of magnitude.

Here are a few benchmarks for illustration. The biggest difference comes when both container and items are relatively small. In that case, the subset approach is about an order of magnitude faster:

>>> smallset = set(range(10))
>>> smallsubset = set(range(5))
>>> %timeit smallset >= smallsubset
110 ns ± 0.702 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit all(x in smallset for x in smallsubset)
951 ns ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This looks like a big difference. But as long as container is a set, all is still perfectly usable at vastly larger scales:

>>> bigset = set(range(100000))
>>> bigsubset = set(range(50000))
>>> %timeit bigset >= bigsubset
1.14 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit all(x in bigset for x in bigsubset)
5.96 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using subset testing is still faster, but only by about 5x at this scale. The speed boost is due to Python's fast c-backed implementation of set, but the fundamental algorithm is the same in both cases.

If your items are already stored in a list for other reasons, then you'll have to convert them to a set before using the subset test approach. Then the speedup drops to about 2.5x:

>>> %timeit bigset >= set(bigsubseq)
2.1 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And if your container is a sequence, and needs to be converted first, then the speedup is even smaller:

>>> %timeit set(bigseq) >= set(bigsubseq)
4.36 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The only time we get disastrously slow results is when we leave container as a sequence:

>>> %timeit all(x in bigseq for x in bigsubseq)
184 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

And of course, we'll only do that if we must. If all the items in bigseq are hashable, then we'll do this instead:

>>> %timeit bigset = set(bigseq); all(x in bigset for x in bigsubseq)
7.24 ms ± 78 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That's just 1.66x faster than the alternative (set(bigseq) >= set(bigsubseq), timed above at 4.36).

So subset testing is generally faster, but not by an incredible margin. On the other hand, let's look at when all is faster. What if items is ten-million values long, and is likely to have values that aren't in container?

>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); set(bigset) >= set(hugeiter)
13.1 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); all(x in bigset for x in hugeiter)
2.33 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Converting the generator into a set turns out to be incredibly wasteful in this case. The set constructor has to consume the entire generator. But the short-circuiting behavior of all ensures that only a small portion of the generator needs to be consumed, so it's faster than a subset test by four orders of magnitude.

This is an extreme example, admittedly. But as it shows, you can't assume that one approach or the other will be faster in all cases.

The Upshot

Most of the time, converting container to a set is worth it, at least if all its elements are hashable. That's because in for sets is O(1), while in for sequences is O(n).

On the other hand, using subset testing is probably only worth it sometimes. Definitely do it if your test items are already stored in a set. Otherwise, all is only a little slower, and doesn't require any additional storage. It can also be used with large generators of items, and sometimes provides a massive speedup in that case.

"doesn't work as expected because Python interprets it as a tuple:" To be more precise, it's because of the order of operations: `in` is evaluated before the first `,`. — Karl Knechtel, Jul 06 '22 at 22:11

score 74 · Answer 2 · answered May 28 '11 at 02:44

74

Another way to do it:

>>> set(['a','b']).issubset( ['b','a','foo','bar'] )
True

answered May 28 '11 at 02:44

Kabie

10,489
1
38
45

23

Fun fact: `set(['a', 'b']) <= set(['b','a','foo','bar'])` is another way to spell the same thing, and looks "mathier". – Kirk Strauser May 28 '11 at 03:55
8

As of Python 2.7 you can use `{'a', 'b'} <= {'b','a','foo','bar'}` – Viktor Stískala Mar 22 '12 at 19:28

score 28 · Answer 3 · edited Oct 22 '19 at 08:31

28

If you want to check all of your input matches,

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

if you want to check at least one match,

>>> any(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

edited Oct 22 '19 at 08:31

Georgy

12,464
7
65
73

answered Dec 10 '18 at 09:48

Mohideen bin Mohammed

18,813
10
112
118

score 12 · Answer 4 · edited Oct 22 '19 at 08:30

12

I'm pretty sure in is having higher precedence than , so your statement is being interpreted as 'a', ('b' in ['b' ...]), which then evaluates to 'a', True since 'b' is in the array.

See previous answer for how to do what you want.

edited Oct 22 '19 at 08:30

Georgy

12,464
7
65
73

answered May 28 '11 at 02:36

Foon

6,148
11
40
42

score 4 · Answer 5 · answered May 28 '11 at 02:37

The Python parser evaluated that statement as a tuple, where the first value was 'a', and the second value is the expression 'b' in ['b', 'a', 'foo', 'bar'] (which evaluates to True).

You can write a simple function do do what you want, though:

def all_in(candidates, sequence):
    for element in candidates:
        if element not in sequence:
            return False
    return True

And call it like:

>>> all_in(('a', 'b'), ['b', 'a', 'foo', 'bar'])
True

score 2 · Answer 6 · edited Oct 22 '19 at 08:47

2

I would say we can even leave those square brackets out.

array = ['b', 'a', 'foo', 'bar']
all([i in array for i in 'a', 'b'])

edited Oct 22 '19 at 08:47

Georgy

12,464
7
65
73

answered Feb 02 '17 at 09:08

szabadkai

96
5

Be careful when using the word `array` for a `list` interchangeably. These two things are different in Python. Here is the link to the `array` module. https://docs.python.org/3/library/array.html – Brett La Pierre Dec 12 '22 at 17:34

dmchdev · Answer 7 · 2017-11-16T17:17:07.990

[x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]

The reason I think this is better than the chosen answer is that you really don't need to call the 'all()' function. Empty list evaluates to False in IF statements, non-empty list evaluates to True.

if [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]:
    ...Do something...

Example:

>>> [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]
['a', 'b']
>>> [x for x in ['G','F'] if x in ['b', 'a', 'foo', 'bar']]
[]

score 1 · Answer 8 · answered May 26 '12 at 13:21

1

Both of the answers presented here will not handle repeated elements. For example, if you are testing whether [1,2,2] is a sublist of [1,2,3,4], both will return True. That may be what you mean to do, but I just wanted to clarify. If you want to return false for [1,2,2] in [1,2,3,4], you would need to sort both lists and check each item with a moving index on each list. Just a slightly more complicated for loop.

answered May 26 '12 at 13:21

user1419042

81
1
1

2

'both'? There are more than two answers. Did you mean that all the answers suffer from this problem, or only two of the answers (and if so, which ones)? – Wipqozn May 28 '12 at 18:08

score 0 · Answer 9 · edited Mar 06 '20 at 15:20

0

Here's how I did it:

A = ['a','b','c']
B = ['c']
logic = [(x in B) for x in A]
if True in logic:
    do something

edited Mar 06 '20 at 15:20

Alireza HI

1,873
1
12
20

answered Mar 06 '20 at 14:10

John

11
1

score 0 · Answer 10 · answered Sep 03 '22 at 01:49

Any

In Python3 you can use intersection of sets as an any:

>>> {'a','b'} & set(['b', 'a', 'foo', 'bar'])
{'a', 'b'}

>>> {'a','b'} & set(['b', 1, 'foo', 'bar'])
{'b'}

of course you could wrap the result in a bool for True/False values:

>>> bool({'a','b'} & set(['b', 1, 'foo', 'bar']))
True

>>> bool({'c'} & set(['b', 1, 'foo', 'bar']))
False

All

Making use of `is subset:

>>> {'a','b'}.issubset(set(['b', 'a', 'foo', 'bar']))
True

>>> {'a','b'}.issubset(set(['b', 1, 'foo', 'bar']))
False

Notes

bool() turns set into boolean
issubset() looks for a set to be a complete subset of another set
& can be used for interesection (any) of sets

The examples can be cleaned up by using variables:

test = {'a','b'}
values = set(['b', 'a', 'foo', 'bar'])

# Any
test & values         # {'a', 'b'}
bool(test & values)   # True

# All
test.issubset(values) # True

Hutch · Answer 11 · 2013-09-26T20:50:07.723

-1

how can you be pythonic without lambdas! .. not to be taken seriously .. but this way works too:

orig_array = [ ..... ]
test_array = [ ... ]

filter(lambda x:x in test_array, orig_array) == test_array

leave out the end part if you want to test if any of the values are in the array:

filter(lambda x:x in test_array, orig_array)

edited Sep 26 '13 at 20:50

answered Sep 26 '13 at 20:36

Hutch

10,392
1
22
18

1

Just a heads up that this won't work as intended in Python 3 where `filter` is a generator. You'd need to wrap it in `list` if you wanted to actually get a result that you could test with `==` or in a boolean context (to see if it is empty). Using a list comprehension or a generator expression in `any` or `all` is preferable. – Blckknght Sep 26 '13 at 20:59

Marwa Matar · Answer 12 · 2022-08-23T13:18:06.387

# This is to extract all count of all combinations inside list of 
# list
import itertools

l = [[1,2,3],[6,5,4,3,7,2],[4,3,2,9],[6,7],[5,1,0],[6,3,2,7]]    
els = list(set(b for a in l for b in a))    
sol = {}    

def valid(p):    
    for s in l:    
        if set(p).issubset(set(s)):    
            if p in sol.keys():    
                sol[p] += 1    
            else:    
                sol[p] = 1    

for c in itertools.combinations(els, 2):    
    valid(c)  
# {(0, 1): 1, 
# (0, 5): 1,
# (1, 2): 1,
# (1, 3): 1,
# (1, 5): 1,
# (2, 3): 4,
# (2, 4): 2,
# (2, 5): 1,
# (2, 6): 2,
# (2, 7): 2,
# (2, 9): 1,
# (3, 4): 2,
# (3, 5): 1,
# (3, 6): 2,
# (3, 7): 2,
# (3, 9): 1,
# (4, 5): 1,
# (4, 6): 1,
# (4, 7): 1,
# (4, 9): 1,
# (5, 6): 1,
# (5, 7): 1,
# (6, 7): 3}

This code is unrelated to the question that it has been posted as an answer to. — Rich L, Aug 26 '22 at 16:24

How to test the membership of multiple values in a list

12 Answers12

Other Options

Speed Tests

The Upshot

Any

All

Notes

Linked

Related