296

I need to verify if a list is a subset of another - a boolean return is all I seek.

Is testing equality on the smaller list after an intersection the fastest way to do this? Performance is of utmost importance given the number of datasets that need to be compared.

Adding further facts based on discussions:

  1. Will either of the lists be the same for many tests? It does as one of them is a static lookup table.

  2. Does it need to be a list? It does not - the static lookup table can be anything that performs best. The dynamic one is a dict from which we extract the keys to perform a static lookup on.

What would be the optimal solution given the scenario?

nbro
  • 15,395
  • 32
  • 113
  • 196
IUnknown
  • 9,301
  • 15
  • 50
  • 76
  • You mention speed, perhaps numpy would be useful, depending on your use. – ninMonkey May 16 '13 at 04:39
  • 2
    Are the list items hashable? – wim May 16 '13 at 04:45
  • 5
    If order is important, this might be a good start - [StackOverflow - Best Way To Determine if a Sequence is in another sequence in Python](http://stackoverflow.com/questions/425604/best-way-to-determine-if-a-sequence-is-in-another-sequence-in-python) –  May 16 '13 at 05:17
  • do you need proper subset, or can they be equal? – törzsmókus Aug 16 '17 at 10:18
  • 4
    Why not set(list_a).issubset(set(list_b)) ? – SeF Sep 15 '17 at 15:21

14 Answers14

242
>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False

>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>> 
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False
max
  • 49,282
  • 56
  • 208
  • 355
Yulan Liu
  • 2,545
  • 1
  • 10
  • 3
  • Is this the optimum solution? What is the time requirement? – tsnorri Nov 24 '14 at 23:37
  • 39
    This looks nicest and writes simpliest, but the fastest should be `set(a).issubset(b)` because in this case you only convert `a` into set but not `b`, which saves time. You can use `timeit` to compare the time consumed in two commands. For example, `timeit.repeat('set(a) – Yulan Liu Nov 28 '14 at 16:50
  • 18
    @YulanLiu: Hate to break it to you, but [the very first thing `issubset` does is check if the argument is a `set`/`frozenset`, and if it isn't, it converts it to a temporary `set` for comparison, runs the check, then throws away the temporary `set`.](https://hg.python.org/cpython/file/e86515294283/Objects/setobject.c#l1719) Timing differences (if any) would be a factor of small differences in LEGB lookup costs (finding `set` a second time is more expensive than attribute lookup on an existing `set`), but it's mostly a wash for large enough inputs. – ShadowRanger Jan 23 '16 at 03:41
  • 4
    If both the list contains same values, then this one is going to return false, the condition should be set(a) <= set(b) instead – ssi-anik Nov 23 '16 at 18:20
  • 5
    How can this answer be correct. He asked for a list not a set. They are completely different. What if a = [1, 3, 3, 5, 5] and b=[1, 3, 3, 3, 5]. Set theory is inappropriate for duplicates. – Eamonn Kenny Nov 14 '17 at 09:37
  • 1
    I would also point out that if a=[1,3,5] and b=[1,3,5], set(a) < set(b) will return False. You can add the equals operator to handle these cases: i.e. set(a) <= set(b). – Jon Dec 15 '17 at 15:14
  • I just tried a=[1,2,3] and b=[1,2,3]: set(a).issubset(b) and got True as the response in 3.10. I think either form will work. – limeri Mar 07 '22 at 17:15
222

Use set.issubset

Example:

a = {1,2}
b = {1,2,3}
a.issubset(b) # True
a = {1,2,4}
b = {1,2,3}
a.issubset(b) # False

The performant function Python provides for this is set.issubset. It does have a few restrictions that make it unclear if it's the answer to your question, however.

A list may contain items multiple times and has a specific order. A set does not. Additionally, sets only work on hashable objects.

Are you asking about subset or subsequence (which means you'll want a string search algorithm)? Will either of the lists be the same for many tests? What are the datatypes contained in the list? And for that matter, does it need to be a list?

Your other post intersect a dict and list made the types clearer and did get a recommendation to use dictionary key views for their set-like functionality. In that case it was known to work because dictionary keys behave like a set (so much so that before we had sets in Python we used dictionaries). One wonders how the issue got less specific in three hours.

Jossef Harush Kadouri
  • 32,361
  • 10
  • 130
  • 129
Yann Vernier
  • 15,414
  • 2
  • 28
  • 26
  • I am referring to a subset only and issubset performs just fine - Thanks. However I am curious about 2 questions in here. 1.Will either of the lists be the same for many tests? It does as one of them is a static lookup table 2.Does it need to be a list? It does not - the static lookup table can be anything that performs best. The dynamic one is a dict from which we extract the keys to perform a static lookup on. Will this fact alter the solution? – IUnknown May 18 '13 at 01:41
  • Not much. The keys of a dictionary are set-like and already arranged in a hash table, and therefore using a set for the static portion will not cause additional complications. Basically the fact that one is a dict means you may not need to convert the static part to a set (you could check all(itertools.imap(dict.has_key, mylist)) with O(n) performance). – Yann Vernier May 18 '13 at 14:28
  • I don't undestand how this (or any other solution relying on sets) can be the accepted answer here. The question is about lists and I frankly think that subset in "verify if one list is a subset of the other" is not to be taken literally. Upon conversion to sets, any information on duplicate elements is lost, yet, if the initial list could contain those, it might be important to check if they appear in the second list as well to truly say that all elements of one list can be found within the other. Sets do not do that! – inVader Jun 13 '19 at 08:05
  • Context matters; this was accepted for helping the asker, and did explain the distinction. We were told the candidates would be representable as sets, so it was a set task. Your case might be different, and the difference you mention would be solved using multisets such as collections.Counter. – Yann Vernier Jun 13 '19 at 13:53
54
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

all(x in two for x in one)

Explanation: Generator creating booleans by looping through list one checking if that item is in list two. all() returns True if every item is truthy, else False.

There is also an advantage that all return False on the first instance of a missing element rather than having to process every item.

Community
  • 1
  • 1
voidnologo
  • 1,005
  • 2
  • 12
  • 14
  • I think for readability and being explicit of what you are trying to achieve, `set(one).issubset(set(two))` is a great solution. With the solution I posted you should be able to use it with any objects if they have the proper comparison operators defined. – voidnologo Jan 23 '16 at 02:52
  • 7
    Use a generator expression, not a list comprehension; the former will allow `all` to short-circuit properly, the latter will perform all the checks even if it would be clear from the first check that the test would fail. Just drop the square brackets to get `all(x in two for x in one)`. – ShadowRanger Jan 23 '16 at 03:45
  • Am i wrong, or you can't use this method with locals? – Homper Dec 21 '19 at 21:08
  • IMO This is the best solution for subset-like checking while preserving order. – ajskateboarder Oct 09 '22 at 23:39
31

Assuming the items are hashable

>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False

If you don't care about duplicate items eg. [1, 2, 2] and [1, 2] then just use:

>>> set([1, 2, 2]).issubset([1, 2])
True

Is testing equality on the smaller list after an intersection the fastest way to do this?

.issubset will be the fastest way to do it. Checking the length before testing issubset will not improve speed because you still have O(N + M) items to iterate through and check.

jamylak
  • 128,818
  • 30
  • 231
  • 230
9

One more solution would be to use a intersection.

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(one).intersection(set(two)) == set(one)

The intersection of the sets would contain of set one

(OR)

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(one) & (set(two)) == set(one)
SuperNova
  • 25,512
  • 7
  • 93
  • 64
5

Set theory is inappropriate for lists since duplicates will result in wrong answers using set theory.

For example:

a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)

has no meaning. Yes, it gives a false answer but this is not correct since set theory is just comparing: 1,3,5 versus 1,3,4,5. You must include all duplicates.

Instead you must count each occurrence of each item and do a greater than equal to check. This is not very expensive, because it is not using O(N^2) operations and does not require quick sort.

#!/usr/bin/env python

from collections import Counter

def containedInFirst(a, b):
  a_count = Counter(a)
  b_count = Counter(b)
  for key in b_count:
    if a_count.has_key(key) == False:
      return False
    if b_count[key] > a_count[key]:
      return False
  return True


a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

Then running this you get:

$ python contained.py 
b in a:  False
b in a:  True
Eamonn Kenny
  • 1,926
  • 18
  • 20
3
one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(x in two for x in one) == set([True])

If list1 is in list 2:

  • (x in two for x in one) generates a list of True.

  • when we do a set(x in two for x in one) has only one element (True).

SuperNova
  • 25,512
  • 7
  • 93
  • 64
0

Pardon me if I am late to the party. ;)

To check if one set A is subset of set B, Python has A.issubset(B) and A <= B. It works on set only and works great BUT the complexity of internal implementation is unknown. Reference: https://docs.python.org/2/library/sets.html#set-objects

I came up with an algorithm to check if list A is a subset of list B with following remarks.

  • To reduce complexity of finding subset, I find it appropriate to sort both lists first before comparing elements to qualify for subset.
  • It helped me to break the loop when value of element of second list B[j] is greater than value of element of first list A[i].
  • last_index_j is used to start loop over list B where it last left off. It helps avoid starting comparisons from the start of list B (which is, as you might guess unnecessary, to start list B from index 0 in subsequent iterations.)
  • Complexity will be O(n ln n) each for sorting both lists and O(n) for checking for subset.
    O(n ln n) + O(n ln n) + O(n) = O(n ln n).

  • Code has lots of print statements to see what's going on at each iteration of the loop. These are meant for understanding only.

Check if one list is subset of another list

is_subset = True;

A = [9, 3, 11, 1, 7, 2];
B = [11, 4, 6, 2, 15, 1, 9, 8, 5, 3];

print(A, B);

# skip checking if list A has elements more than list B
if len(A) > len(B):
    is_subset = False;
else:
    # complexity of sorting using quicksort or merge sort: O(n ln n)
    # use best sorting algorithm available to minimize complexity
    A.sort();
    B.sort();

    print(A, B);

    # complexity: O(n^2)
    # for a in A:
    #   if a not in B:
    #       is_subset = False;
    #       break;

    # complexity: O(n)
    is_found = False;
    last_index_j = 0;

    for i in range(len(A)):
        for j in range(last_index_j, len(B)):
            is_found = False;

            print("i=" + str(i) + ", j=" + str(j) + ", " + str(A[i]) + "==" + str(B[j]) + "?");

            if B[j] <= A[i]:
                if A[i] == B[j]:
                    is_found = True;
                last_index_j = j;
            else:
                is_found = False;
                break;

            if is_found:
                print("Found: " + str(A[i]));
                last_index_j = last_index_j + 1;
                break;
            else:
                print("Not found: " + str(A[i]));

        if is_found == False:
            is_subset = False;
            break;

print("subset") if is_subset else print("not subset");

Output

[9, 3, 11, 1, 7, 2] [11, 4, 6, 2, 15, 1, 9, 8, 5, 3]
[1, 2, 3, 7, 9, 11] [1, 2, 3, 4, 5, 6, 8, 9, 11, 15]
i=0, j=0, 1==1?
Found: 1
i=1, j=1, 2==1?
Not found: 2
i=1, j=2, 2==2?
Found: 2
i=2, j=3, 3==3?
Found: 3
i=3, j=4, 7==4?
Not found: 7
i=3, j=5, 7==5?
Not found: 7
i=3, j=6, 7==6?
Not found: 7
i=3, j=7, 7==8?
not subset
Hamza Rashid
  • 1,329
  • 15
  • 22
0

Below code checks whether a given set is a "proper subset" of another set

 def is_proper_subset(set, superset):
     return all(x in superset for x in set) and len(set)<len(superset)
Leo Bastin
  • 310
  • 4
  • 6
  • 1
    Why is your ideal for the empty set to break the established mathematical rule? Wikipedia: [The empty set { }, denoted by ∅, is also a subset of any given set X. It is also always a proper subset of any set except itself.](https://en.wikipedia.org/wiki/Subset) – Yann Vernier Nov 10 '17 at 07:37
  • Thanks @YannVernier I have modified to include empty checks for both subset and superset so it returns false when both are empty. – Leo Bastin Nov 14 '17 at 09:14
  • But *why* do you do this? For A to be a subset of B simply means A contains no items that are not in B, or equivalently, all items in A are also in B. The empty set is therefore a subset of *all* sets, including itself. Your extra checks assert that it is not, and you assert this is somehow ideal, but it is contrary to established terminology. What's the advantage? – Yann Vernier Nov 14 '17 at 17:03
  • Thanks @YannVernier Now the code checks whether a given set is a "proper subset" of another set. – Leo Bastin Nov 17 '17 at 13:03
  • This is as bad as the answers relying on the use of _sets_. While mathematically speaking, a set is a collection of distinct elements, we can and should not rely on that assumption when checking if one list is part of another. If the initial list were to contain duplicate, your function could still return _True_, even if the element in question is present in the second list only once. I do not think this is the correct behavior when trying to compare lists. – inVader Jun 13 '19 at 08:19
0

In python 3.5 you can do a [*set()][index] to get the element. It is much slower solution than other methods.

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

result = set(x in two for x in one)

[*result][0] == True

or just with len and set

len(set(a+b)) == len(set(a))
SuperNova
  • 25,512
  • 7
  • 93
  • 64
0

Here is how I know if one list is a subset of another one, the sequence matters to me in my case.

def is_subset(list_long,list_short):
    short_length = len(list_short)
    subset_list = []
    for i in range(len(list_long)-short_length+1):
        subset_list.append(list_long[i:i+short_length])
    if list_short in subset_list:
        return True
    else: return False
Samer Abu Gahgah
  • 751
  • 1
  • 9
  • 18
Mindee
  • 1
0

Most of the solutions consider that the lists do not have duplicates. In case your lists do have duplicates you can try this:

def isSubList(subList,mlist):
    uniqueElements=set(subList)
    for e in uniqueElements:
        if subList.count(e) > mlist.count(e):
            return False     
    # It is sublist
    return True

It ensures the sublist never has different elements than list or a greater amount of a common element.

lst=[1,2,2,3,4]
sl1=[2,2,3]
sl2=[2,2,2]
sl3=[2,5]

print(isSubList(sl1,lst)) # True
print(isSubList(sl2,lst)) # False
print(isSubList(sl3,lst)) # False
Ignacio Alorre
  • 7,307
  • 8
  • 57
  • 94
0

Since no one has considered comparing two strings, here's my proposal.

You may of course want to check if the pipe ("|") is not part of either lists and maybe chose automatically another char, but you got the idea.

Using an empty string as separator is not a solution since the numbers can have several digits ([12,3] != [1,23])

def issublist(l1,l2):
    return '|'.join([str(i) for i in l1]) in '|'.join([str(i) for i in l2])
y4cine
  • 396
  • 2
  • 20
  • I think the list should be sorted before comparing. Also it doesn't take into consideration if a list has duplications. – Eugene W. Aug 11 '23 at 06:23
  • That is only true if you want to check if the elements of the sublist are in the main one. If you want to check for the sublist with a given order is in that form in the main list, then my method is valid. – y4cine Aug 11 '23 at 08:43
-1

If you are asking if one list is "contained" in another list then:

>>>if listA in listB: return True

If you are asking if each element in listA has an equal number of matching elements in listB try:

all(True if listA.count(item) <= listB.count(item) else False for item in listA)
DevPlayer
  • 5,393
  • 1
  • 25
  • 20
  • This doesn't work for me. Returns false even if listA == listB – cass Nov 20 '15 at 17:05
  • @cass I've only tested with strings. Try this on your machine. http://pastebin.com/9whnDYq4 – DevPlayer Nov 20 '15 at 22:55
  • I was referring to the "if listA in listB: return True" part, not the second part. – cass Nov 21 '15 at 09:36
  • @cass Consider: ['one', 'two'] in ['one', 'two'] yields False. ['one', 'two'] in ['one', 'two', 'three'] yields False. ['one', 'two'] in [['one', 'two'], 'three'] yields True. So yes if listA == ListB then listA in listB will always return False because listA would need to be a list element within listB. Perhaps you are thinking: listA in listB means "Are items in listA listed as items in listB. That is not the meaning of listA in listB – DevPlayer Nov 21 '15 at 13:41
  • @cass Ah, I see how my post confuses. The original post asked to test for listA being a subset of listB. Technically my post is wrong based on the original post's question. For it to be right the question would have to have asked for "if listA in [item0, item2, listA, item3, listA, ]". Not "items in ['a', 'b', 'c'] in ['d', 'c', 'f', 'a', 'b', 'a']". – DevPlayer Nov 21 '15 at 13:49