Efficient Matching of Sublists In Python

Question

Given a two-dimensional list, I would like to find everything that contains a sublist. I realize I can do something like:

#Psuedo-Python (not kosher)
def MatchAll(theList,toMatch):
    result=list(theList)
    for nMatch in toMatch:
        for nResult in result:
            if not nMatch in nResult:
                result.remove(nResult)
    return result

But there seems to be all kinds of bad about this. It seems very unlike the Python code I've seen and dealt with so far, besides I'm making changes to the list while iterating it, which I've read is not at all a good thing. Also, it seems horribly inefficient: while toMatch shouldn't have a length greater then three for my purposes, the length of theList is unknown and could be quite large. Any help is greatly appreciated, and thanks in advance.

The semantics of this aren't clear at all. What do you mean by "everything that contains a sublist"? — Sven Marnach, Nov 09 '11 at 22:46

agf · Accepted Answer · 2011-11-09T22:54:06.213

3

What I'd do is only keep sub-lists that match all the items in the "match" list.

def match_all(the_list, to_match):
    return [sublist for sublist in the_list 
                if all(item in sublist for item in to_match)]

You can speed this up by using a set:

def match_all(the_list, to_match):
    matches = set(to_match).issubset
    return [sublist for sublist in the_list if matches(sublist)]

edited Nov 09 '11 at 22:54

answered Nov 09 '11 at 22:48

agf

171,228
44
289
238

Please, make that condition `if set(item) <= set(to_match)`. :) – Sven Marnach Nov 09 '11 at 22:50
Holy Carp! This seems like it should be so obvious in hindsight. I thank you much, good sir. – LHT Nov 09 '11 at 22:52
Eh, you mean replace the all bit with the set()<=set() bit? I'm guessing converting lists to sets isn't too terribly inefficient? – LHT Nov 09 '11 at 22:53
@LHT I added a `set` version that doesn't require converting each `sublist` to a `set`. – agf Nov 09 '11 at 22:54
@SvenMarnach Was deciding on the best way to use a `set`. Look good? (Your version was backwards, and using `<=` requires the sublist be converted to a `set`). – agf Nov 09 '11 at 22:54
This will match every element from the search space that contains all the items in the sublist even if the matches are not sequential and/or do not appear in the same order as the search sublist. @LHT, is this what you wanted? – ktdrv Nov 09 '11 at 22:58
@kaloyan Yes, that's what I assumed he wanted. Only sublists that contain all elements in the "match" list. – agf Nov 09 '11 at 23:01
@agf: Yes, of course, far better than my suggestion. I already upvoted, so I can't honour the change. :) – Sven Marnach Nov 09 '11 at 23:02
@kaloyan Yup, exactly what I want. Both the 2d `the_list` and the 1d `to_match` in the example above should already be sorted and have had duplicates removed elsewhere, anyway. – LHT Nov 09 '11 at 23:03
@LHT If you were looking only for that exact sublist consecutively, in-order, my answer to http://stackoverflow.com/questions/7968881/finding-consecutive-values-within-a-list/7968953#7968953 shows some example "substring search" implementations. – agf Nov 09 '11 at 23:03

Efficient Matching of Sublists In Python

1 Answers1