5

Is there a built-in Pythonic way to determine if one list completely contains the contents of another, including duplicated entries but disregarding the order of items?

>>> l1 = [2, 2, 3]
>>> l2 = [2, 2]
>>> l3 = [3, 2]
>>> l4 = [2, 2, 2]
>>> l5 = [2, 5, 2]

>>> is_superset(l1, l2)
True
>>> is_superset(l1, l3)
True
>>> is_superset(l1, l4)
False
>>> is_superset(l1, l5)
False
martineau
  • 119,623
  • 25
  • 170
  • 301
Chuck
  • 4,662
  • 2
  • 33
  • 55

4 Answers4

10

If there were no duplicates, or duplicates didn't matter (that is, if your l1 and l3 were both supersets of each other), you'd just use sets. But since if you want l1 to be a proper superset of l3, you're talking about multisets. Fortunately, Counter already implements multisets for you:

from collections import Counter
def is_superset(a, b):
    return not Counter(b) - Counter(a)

Notice that this - is proper multiset difference between multisets (just as - is proper set difference between sets), not an elementwise subtraction across dicts. So if you subtract a super(multi)set, you get an empty multiset (that is, Counter()—which is, like all empty collections in Python, falsey).

So now:

>>> is_superset(l1, l2)
True
>>> is_superset(l1, l3)
True
>>> is_superset(l1, l4)
False
>>> is_superset(l1, l5)
False

Plus:

>>> is_superset(l3, l1)
False
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Never heard of it, what is the semantics of `bool(Counter(..))`? Is there a documentation page on it? – Ben Usman May 04 '18 at 17:47
  • 1
    @BenUsman It's the same as _every_ collection in Python: empty collections are falsey, non-empty ones are true. (There are a few quasi-collections in third-party code, most famously numpy arrays, where that isn't true, but `Counter` isn't one of them.) – abarnert May 04 '18 at 17:48
  • @BenUsman See [Boolean operations](https://docs.python.org/3/reference/expressions.html#boolean-operations) on the builtins, [`__bool__`](https://docs.python.org/3/reference/datamodel.html#object.__bool__) on falling back to `__len__`. Beyond that, I guess it's not explicitly defined, but the fact that a `Counter` is defined as "a dict subclass" _implies_ that it should follow the same rules as dict, in the same way that it implies that `len` works. If you don't trust that, there's [the source](https://github.com/python/cpython/blob/3.6/Lib/collections/__init__.py#L451) – abarnert May 04 '18 at 17:58
  • thanks! The falseness of empty collections is indeed intuitive. I did not know that `Counter` supports all these algebraic operations similarly to `set`. – Ben Usman May 04 '18 at 19:24
4

Here is a solution using Counter

from collections import Counter

def is_superset(l1, l2):
    c1, c2 = Counter(l1), Counter(l2)
    return all(c1[k] >= c2[k] for k in c2)
bphi
  • 3,115
  • 3
  • 23
  • 36
  • 1
    I don't think you need the `k in c1` check. – Mark Dickinson May 04 '18 at 17:40
  • Right, forgot Counters give 0 for non existent keys. Thanks – bphi May 04 '18 at 17:42
  • 1
    Just subtract the counters. It's not elementwise subtraction, it's multiset subtraction, so you will get back an empty multiset if you subtract a super(multi)set. Or intersect them, as in [fferri's answer](https://stackoverflow.com/a/50180446/908494). Either way, if you're going to use a Counter as a multiset, take advantage of it being a multiset to write the intuitive multiset operation. – abarnert May 04 '18 at 17:45
3

Another solution using Counter and the builtin intersection method:

from collections import Counter

def is_superset(l1, l2):
    c1, c2 = Counter(l1), Counter(l2)
    return c1 & c2 == c2

Test:

>>> l1 = [2, 2, 3]
>>> l2 = [2, 2]
>>> l3 = [3, 2]
>>> l4 = [2, 2, 2]
>>> l5 = [2, 5, 2]
>>> is_superset(l1, l2)
True
>>> is_superset(l1, l3)
True
>>> is_superset(l1, l4)
False
>>> is_superset(l1, l5)
False
>>>
fferri
  • 18,285
  • 5
  • 46
  • 95
  • Somebody downvotes this question as soon as @fferri post question. Why ? – Mihai Alexandru-Ionut May 04 '18 at 17:43
  • 2
    It may be worth parenthesizing `c1 & c2` for clarity; while Python's precedence puts `&` at higher precedence than `==`, many other languages have it the other way. – user2357112 May 04 '18 at 17:46
  • The other way would read `return c1 & True` which doesn't make sense. – fferri May 04 '18 at 17:48
  • I think intersection may actually be clearer than set difference here, so I have no idea why someone downvoted your answer but not mine. Maybe just because they're a C user and were confused by the point that @user2357112 suggested clarifying? – abarnert May 04 '18 at 17:51
1

Here's an inefficient solution that verifies that for each element in the sublist, its occurrence number in the sublist must be lower or equal than its occurrence number in the superlist:

def is_superset(l1, l2):
    return all([l1.count(item) >= l2.count(item) for item in l2])
CristiFati
  • 38,250
  • 9
  • 50
  • 87