Suppose I have a list of lists like the one below (the actual list is much longer):
fruits = [['apple', 'pear'],
['apple', 'pear', 'banana'],
['banana', 'pear'],
['pear', 'pineapple'],
['apple', 'pear', 'banana', 'watermelon']]
In this case, all the items in the lists ['banana', 'pear']
, ['apple', 'pear']
and ['apple', 'pear', 'banana']
are contained in the list ['apple', 'pear', 'banana', 'watermelon']
(the order of items does not matter), so I would like to remove ['banana', 'pear']
, ['apple', 'pear']
, and ['apple', 'pear', 'banana']
as they are subsets of ['apple', 'pear', 'banana', 'watermelon']
.
My current solution is shown below. I first use ifilter
and imap
to create a generator for the supersets that each list might have. Then for those cases that do have supersets, I use compress
and imap
to drop them.
from itertools import imap, ifilter, compress
supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)
new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
new_list
#[['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
I wonder if there are more efficient ways to do this?