Fastest way to remove subsets of lists from a list in Python

Question

Suppose I have a list of lists like the one below (the actual list is much longer):

fruits = [['apple', 'pear'],
          ['apple', 'pear', 'banana'],
          ['banana', 'pear'],
          ['pear', 'pineapple'],
          ['apple', 'pear', 'banana', 'watermelon']]

In this case, all the items in the lists ['banana', 'pear'], ['apple', 'pear'] and ['apple', 'pear', 'banana'] are contained in the list ['apple', 'pear', 'banana', 'watermelon'] (the order of items does not matter), so I would like to remove ['banana', 'pear'], ['apple', 'pear'], and ['apple', 'pear', 'banana'] as they are subsets of ['apple', 'pear', 'banana', 'watermelon'].

My current solution is shown below. I first use ifilter and imap to create a generator for the supersets that each list might have. Then for those cases that do have supersets, I use compress and imap to drop them.

from itertools import imap, ifilter, compress

supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)


new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
new_list
#[['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]

I wonder if there are more efficient ways to do this?

Possible duplicate of [Python - verifying if one list is a subset of the other](http://stackoverflow.com/questions/16579085/python-verifying-if-one-list-is-a-subset-of-the-other) — Brent Washburne, Feb 04 '16 at 18:47
You can start by droping imap and ifilter to use generator expressions/list comprehensions. They work the same way but produce readable code... — JBernardo, Feb 04 '16 at 18:59
@BrentWashburne It's not exactly a duplicate. As you can see, my current solution does in fact use `issubset()` as suggested by the linked post. My question is more about how to remove lists that are subsets of other lists in a big list. — Alex, Feb 04 '16 at 19:01
@dawg: Sorry, forgot to change the code. `foo` was supposed to be `supersets`. I updated it — Alex, Feb 04 '16 at 19:32
Are all sub-lists guaranteed to be sets themselves, or could you have `[['pear', 'pineapple', 'pear'], ['pineapple', 'pear']]`? — Kyle Pittman, Feb 04 '16 at 19:38
@dawg works ok for me. There's an extra `from itertools` floating there but that's the only thing I changed. EDIT: well there was until an edit went through. ;) — Kyle Pittman, Feb 04 '16 at 19:47

lukaszzenko · Answer 1 · 2016-02-04T20:02:56.417

7

filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)

edited Feb 04 '16 at 20:02

answered Feb 04 '16 at 19:56

lukaszzenko

317
4
11

When I tried your code, I got `[['apple', 'pear'], [['apple', 'pear', 'banana'], ['banana', 'pear'], ['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]` – Alex Feb 04 '16 at 20:01
There was a mistake in my code. I think that current version should work. – lukaszzenko Feb 04 '16 at 20:03
It worked for me either way you had it - and either way, this is such a Pythonic answer. It makes me happy. – Kyle Pittman Feb 04 '16 at 20:05
Weird. In Canopy editor interface, I kept getting an empty list. But when I tried it in the command-line interface, I got the right results! Thanks for this – Alex Feb 04 '16 at 20:08

dawg · Accepted Answer · 2016-02-04T21:16:55.490

I don't know if it is faster but this is easier to read (to me anyway):

sets={frozenset(e) for e in fruits}  
us=set()
while sets:
    e=sets.pop()
    if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
        continue
    else:
        us.add(e)

Update

It is fast. Faster still is to use a for loop. Check timings:

fruits = [['apple', 'pear'],
        ['apple', 'pear', 'banana'],
        ['banana', 'pear'],
        ['pear', 'pineapple'],
        ['apple', 'pear', 'banana', 'watermelon']]

from itertools import imap, ifilter, compress    

def f1():              
    sets={frozenset(e) for e in fruits}  
    us=[]
    while sets:
        e=sets.pop()
        if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
            continue
        else:
            us.append(list(e))   
    return us           

def f2():
    supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)
    new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
    return new_list

def f3():
    return filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)

def f4():              
    sets={frozenset(e) for e in fruits}  
    us=[]
    for e in sets:
        if any(e < s for s in sets):
            continue
        else:
            us.append(list(e))   
    return us              

if __name__=='__main__':
    import timeit     
    for f in (f1, f2, f3, f4):
        print f.__name__, timeit.timeit("f()", setup="from __main__ import f, fruits"), f()

On my machine on Python 2.7:

f1 8.09958791733 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]
f2 15.5085151196 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
f3 11.9473619461 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
f4 5.87942910194 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]

I tried `f1()`, and I got `set()` as the result..mm – Alex Feb 04 '16 at 20:04 — Alex, Feb 04 '16 at 20:04

silicon23 · Answer 3 · 2022-08-09T23:41:18.220

Answer posted by @lukaszzenko is correct and works for Python 2.

For Python 3, it will give the object. The code below works on Python 3.

list (filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits) )

Related post in stackoverflow: Python list filtering: remove subsets from list of lists

You may also find other ways of doing it in the link below: Remove sublists that are present in another sublist

Fastest way to remove subsets of lists from a list in Python

3 Answers3