2

What is a efficient way to check if a list is within another list? Something like:

[2,3] in [1,2,3,4]      #evaluates True
[1,5,4] in [5,1,5,4]    #evaluates True
[1,2] in [4,3,2,1]      #evaluates False

Order within the list matters.

Sarah Pierson
  • 149
  • 10
  • Do the elements from the first list need to be consecutive within the second list? For instance, what should `[1,2] in [1,3,2]` return? – Aasmund Eldhuset Aug 11 '16 at 01:02
  • false, because it has to be consecutive – Sarah Pierson Aug 11 '16 at 01:11
  • 2
    This question is nothing like http://stackoverflow.com/questions/3313590/check-for-presence-of-a-sublist-in-python which assumes the data is binary and uses concatenation. The data here can be any numeric value, e.g. [2, 55, 100]. In this case, the proposed solution in the 'duplicate' question is not applicable. – Alexander Aug 11 '16 at 01:15
  • @Alexander - The OP of that question did not specify binary data only and there are answers/solutions that would work for *this* data.- specifically the answer with the most votes. – wwii Aug 11 '16 at 01:36
  • Actually, the question at http://stackoverflow.com/questions/3313590/check-for-presence-of-a-sublist-in-python/3314913#3314913 is the same as this one, and the most upvoted answer there would also work for this (and is coincidentally the same as my answer below). – Matthias Fripp Aug 11 '16 at 01:37
  • @wwii The OP's data is obviously not binary and your answer was not the accepted solution of that question (despite being the most upvoted). I agree that the most upvoted answer works in this scenario and is about 3x faster than my proposed solution. – Alexander Aug 11 '16 at 01:48

2 Answers2

3
def check_ordered_sublist_in_list(sub_list, main_list):
    sub_list = np.array(sub_list)
    main_list = np.array(main_list)
    return any(all(main_list[n:(n + len(sub_list))] == sub_list) 
               for n in range(0, len(main_list) - len(sub_list) + 1))

>>> check_ordered_sublist_in_list([2, 3], [1, 2, 3, 4])
True

>>> check_ordered_sublist_in_list([1, 5, 4], [5, 1, 5, 4])
True

>>> check_ordered_sublist_in_list([1, 2], [4, 3, 2, 1])
False

This converts the lists to numpy arrays (for computational efficiency) and then uses slicing to check if the sub_list is contained within the slice. Any success returns True.

Alexander
  • 105,104
  • 32
  • 201
  • 196
  • This seems like a misuse of `numpy`. Constructing the `np.array`s is a fairly expensive operation. This will most likely negate whatever computational efficiency is gained by using `numpy` just to perform some simple slices (especially since the lists in the example are very small). – pzp Aug 11 '16 at 01:47
  • @pzp Possibly, but the lists are only small in this simple example. Actual use cases tend to be much more complex. However, I agree that the `is_in` solution proposed by @mfripp is more efficient. – Alexander Aug 11 '16 at 01:53
  • Using "ordered_sublist" in the function name seems redundant—aren't all lists ordered in Python? – martineau Aug 11 '16 at 02:13
  • @martineau Yes, all lists are ordered. The function title was intended to clarify that [1, 2] is in [1, 2, 3, 4], but [2, 1] is not. The ordering matters, as clarified by the OP and to distinguish between an unordered collection. An initial reply incorrectly used sets, for example. – Alexander Aug 11 '16 at 02:22
  • The OP never asked about unordered collections, only if a list was a sublist of another. Since both are lists, both are ordered by definition. – martineau Aug 11 '16 at 02:25
  • @martineau "Order within the list matters." was later added to the question, so I simply added 'ordered' to the function title to clarify that the order of the list was being taken into consideration (vs. treating it as an unordered collection). – Alexander Aug 11 '16 at 02:29
2

You could use this:

def is_in(short, long):
    return any(short==long[i:i+len(short)] for i in range(len(long)-len(short)+1))

is_in([2,3], [1,2,3,4])   # True
is_in([1,5,4], [5,1,5,4]) # True
is_in([1,2], [4,3,2,1])   # False

If you really care about speed, these expressions are 20-30% faster:

def segments(long, length):
    return [long[i:i+length] for i in range(len(long)-length+1)]

def is_in_seg(short, long):
    return short in segments(long, len(short))

is_in_seg([1,5,4], [5,1,5,4])      # true
[1,5,4] in segments([5,1,5,4], 3)  # true

And this is 47% faster, but it uses tuples instead of lists:

import itertools
def segments_zip(long, length):
    return itertools.izip(*[long[i:] for i in xrange(length)])

(2,3) in segments_zip((1,2,3,4), 2)    # True
(1,5,4) in segments_zip((5,1,5,4), 3)  # True
(1,2) in segments_zip((4,3,2,1), 2)    # False

The extra speed comes from using itertools.izip, which can stop generating segments when a match is found; from using xrange, which avoids creating the whole range list; and from using tuples, which are generally slightly faster than lists. But the tiny speed advantage will vanish if you have to convert lists to tuples to use it.

Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45