Python: find a list within members of another list(in order)

Question

If I have this:

a='abcdefghij'
b='de'

Then this finds b in a:

b in a => True

Is there a way of doing an similar thing with lists? Like this:

a=list('abcdefghij')
b=list('de')

b in a => False

The 'False' result is understandable - because its rightly looking for an element 'de', rather than (what I happen to want it to do) 'd' followed by 'e'

This is works, I know:

a=['a', 'b', 'c', ['d', 'e'], 'f', 'g', 'h']
b=list('de')
b in a => True

I can crunch the data to get what I want - but is there a short Pythonic way of doing this?

To clarify: I need to preserve ordering here (b=['e','d'], should return False).

And if it helps, what I have is a list of lists: these lists represents all possible paths (a list of visited nodes) from node-1 to node-x in a directed graph: I want to 'factor' out common paths in any longer paths. (So looking for all irreducible 'atomic' paths which constituent all the longer paths).

UPDATE: There are performance concerns about this method, due to list copying in slices. Also, as it is recursive, you can encounter recursion limit for long lists. To eliminate copying, you can use Numpy slices which creates views, not copies. If you encounter performance or recursion limit issues you should use solution without recursion.

edited Dec 09 '22 at 20:00

Glorfindel

21,988
13
81
109

answered Feb 12 '10 at 09:50

Rorick

8,857
3
32
37

This works well for me - and nicely written to make sense. Thanks ! [I'm too new to Python to know what 'Pythonic' really means, but if I was a betting man, I would bet that this is Pythonic] – monojohnny Feb 12 '10 at 10:20
Oh, I have just shortened a code a bit. Try this new version. If it doesn't fit you (or I made a bug =)), you can stick to the old one. – Rorick Feb 12 '10 at 10:28
You're building a lot of copies of b due to recursion, that's a definite drawback of your solution. Imagine you have a list with one million entries, and your searched sublist appears at the very end. – Johannes Charra Feb 12 '10 at 10:34
2

Hope your lists aren't too big as it creates approx. len(b) lists. (len(b)/2 + len(b)/2). – tback Feb 12 '10 at 10:38
Its ok for my lists - never more than ~5-10 items. I'm gonna stick to the old imp for now, but I might take a look later. Cheers! – monojohnny Feb 12 '10 at 10:42
And ditto my original one) List is too long, so we get - yes - stack overflow! – Rorick Feb 12 '10 at 14:39
There's no real reason to write this recursively when an iterative method is just as obvious. – Mike Graham Feb 12 '10 at 17:07
I've "tail-optimized" your code by hand. http://paste.pocoo.org/show/177416/ -- it is still slow but now it can handle any size list. – nosklo Feb 13 '10 at 09:01

nosklo · Answer 3 · 2010-02-13T08:44:19.583

7

I think this will be faster - It uses C implementation list.index to search for the first element, and goes from there on.

def find_sublist(sub, bigger):
    if not bigger:
        return -1
    if not sub:
        return 0
    first, rest = sub[0], sub[1:]
    pos = 0
    try:
        while True:
            pos = bigger.index(first, pos) + 1
            if not rest or bigger[pos:pos+len(rest)] == rest:
                return pos
    except ValueError:
        return -1

data = list('abcdfghdesdkflksdkeeddefaksda')
print find_sublist(list('def'), data)

Note that this returns the position of the sublist in the list, not just True or False. If you want just a bool you could use this:

def is_sublist(sub, bigger): 
    return find_sublist(sub, bigger) >= 0

edited Feb 13 '10 at 08:44

answered Feb 12 '10 at 12:07

nosklo

217,122
57
293
297

I timed this one, too ... you beat me, nosklo. :) – Johannes Charra Feb 12 '10 at 12:54
There is a small bug: the index returned (when substring is actually found) is one after the index of the first character. I think you should return pos-1 to correct for that. But nice answer, I noticed only because I'm using it :) – xuloChavez Apr 23 '12 at 12:05
special checking for `if not rest` is superfluous – panda-34 Mar 06 '16 at 16:43

Johannes Charra · Answer 4 · 2010-02-12T14:44:01.810

I timed the accepted solution, my earlier solution and a new one with an index. The one with the index is clearly best.

EDIT: I timed nosklo's solution, it's even much better than what I came up with. :)

def is_sublist_index(a, b):
    if not a:
        return True

    index = 0
    for elem in b:
        if elem == a[index]:
            index += 1
            if index == len(a):
                return True
        elif elem == a[0]:
            index = 1
        else:
            index = 0

    return False

def is_sublist(a, b):
    return str(a)[1:-1] in str(b)[1:-1]

def is_sublist_copylist(a, b):
    if a == []: return True
    if b == []: return False
    return b[:len(a)] == a or is_sublist_copylist(a, b[1:])

from timeit import Timer
print Timer('is_sublist([99999], range(100000))', setup='from __main__ import is_sublist').timeit(number=100)
print Timer('is_sublist_copylist([99999], range(100000))', setup='from __main__ import is_sublist_copylist').timeit(number=100)
print Timer('is_sublist_index([99999], range(100000))', setup='from __main__ import is_sublist_index').timeit(number=100)
print Timer('sublist_nosklo([99999], range(100000))', setup='from __main__ import sublist_nosklo').timeit(number=100)

Output in seconds:

4.51677298546

4.5824368

1.87861895561

0.357429027557

+1 for timing. This solution is best form performance view, but I prefer mine as it looks much clearer to me. This one is rather C-way then Pythonic. But I'm glad that Python allows both ways) — Rorick, Feb 12 '10 at 12:58
I don't know how about you, but I slept a little this night) My solution doesn't work for such long lists because of recursion limit. I even started to imagine that there's tail recursion optimization in Python!! Of course, there is no. Fix `is_sublist_copylist` to call itself, but not another function %) — Rorick, Feb 12 '10 at 14:33
Nice. Making the smaller `list` more than just a single element will probably reveal more about the relative speed of the solutions. Try something like `range(19000,99999)` instead of `[99999]`. — MAK, Feb 12 '10 at 16:21

Ramashalanka · Answer 5 · 2010-02-12T09:45:50.393

2

So, if you aren't concerned about the order the subset appears, you can do:

a=list('abcdefghij')
b=list('de')
set(b).issubset(set(a))

True

Edit after you clarify: If you need to preserve order, and the list is indeed characters as in your question, you can use:

''.join(a).find(''.join(b)) > 0

edited Feb 12 '10 at 09:45

answered Feb 12 '10 at 09:29

Ramashalanka

8,564
1
35
46

Thanks - but actually I do need to preserve order I'm afraid - I'll update the question to clarify that. – monojohnny Feb 12 '10 at 09:40
Thanks to this and another post I learnt 'join()' today ! Cheers – monojohnny Feb 12 '10 at 10:21

score 2 · Answer 6 · answered Apr 05 '17 at 16:55

This should work with whatever couple of lists, preserving the order. Is checking if b is a sub list of a

def is_sublist(b,a): 

    if len(b) > len(a):
        return False    

    if a == b:
        return True    

    i = 0
    while i <= len(a) - len(b):
        if a[i] == b[0]:
            flag = True
            j = 1
            while i+j < len(a) and j < len(b):
                if a[i+j] != b[j]:
                    flag = False
                j += 1
            if flag:
                return True
        i += 1
    return False

score 1 · Answer 7 · answered Feb 12 '10 at 10:06

1

>>>''.join(b) in ''.join(a)

True

answered Feb 12 '10 at 10:06

mshsayem

17,557
11
61
69

That is quite promising (I see it 'flattens' the list to a string) - but in my case, the list memembers themselves are not always single characters [ in fact they are numbers ] - and I might lose information if I collapse them like this: my fault for over-simplifying the question! I might look at zero-padding the numbers so they are all fixed-sized elements though. Thanks. – monojohnny Feb 12 '10 at 10:13
1

what if u first convert the numbers into delimited str, say: '*'.join([str(i) for i in b]) in '*'.join([str(i) for i in a]) – mshsayem Feb 12 '10 at 10:26
1

This would raise for `[2, 4] in [1, 7, 8]` and would return an untrue answer for `['qqhe', 'lloqq'] in ['why', 'hello']`. – Mike Graham Feb 12 '10 at 17:06
See the comments above... I had not changed the answer since OP has accepted one... – mshsayem Feb 13 '10 at 10:52

score 0 · Answer 8 · answered Feb 12 '10 at 09:31

0

Not sure how complex your application is, but for pattern matching in lists, pyparsing is very smart and easy to use.

answered Feb 12 '10 at 09:31

PhoebeB

8,434
8
57
76

Thanks for the link - might be a little OTT for what I want, but will check it out.Cheers – monojohnny Feb 12 '10 at 10:16

Johannes Charra · Answer 9 · 2010-02-12T18:27:57.690

-1

Use the lists' string representation and remove the square braces. :)

def is_sublist(a, b):
    return str(a)[1:-1] in str(b)

EDIT: Right, there are false positives ... e.g. is_sublist([1], [11]). Crappy answer. :)

edited Feb 12 '10 at 18:27

answered Feb 12 '10 at 10:27

Johannes Charra

29,455
6
42
51

Nice - although in my case I would need to pre-process my items to ensure they are all the same length - otherwise going back from string->list (I actually need an 'index' method returned as well as True|False for my imp). Cheers – monojohnny Feb 12 '10 at 10:44
This has false positives in general. – Mike Graham Feb 12 '10 at 17:09

Python: find a list within members of another list(in order)

Related

9 Answers9

Linked

Related