I want to check if a sublist is present in another (larger) list, in the exact same order of elements. I also want it to allow wildcards. For example I have the following lists:
>>> my_lists
[[0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 2, 2],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1],
[0, 1, 1, 1, 1, 2, 2, 2, 1, 0, 0, 0, 2, 2, 1, 1, 0, 0, 1, 1, 0],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 0, 0, 1, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0],
[0, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 1, 2, 1, 1, 0, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 0, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
And the sublist: [0, 0, 0, 1]
. If I want to find which lists contain this exact sublist I can do (taken from here):
def my_func(_list, sub_list):
n = len(sub_list)
return any((sub_list== _list[i:i+n]) for i in range(len(_list)-n+1))
for l in my_lists:
if my_func(l, [0, 0, 0, 1]):
print(l)
... which basically makes all possible sublists of the same length as the sub_list
, and checks whether or not any are equal. And I would get the following output since these lists contain [0, 0, 0, 1]
:
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
[0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Now I also want to add wildcards, meaning that I can give the sublist wildcard elements. For example, now I want to find the sublist [*, *, 0, 0, 0, 1, *]
. The asterisks here mean that for those elements, the value could be anything in the list. But for those asterisks there must be a value. The sublist [*, *, 0, 0, 0, 1, *]
would now output:
[1, 1, 1, 1, 0, 2, 1, 2, 1, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0]
[1, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0]
Note that now [0, 0, 0, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
is not included since this list doesn't have two values before the [0, 0, 0, 1]
sequence starts. The same goes for [0, 0, 0, 0, 1, 1, 2, 1, 2, 2, 1, 1]
, which also doesn't have two values before the sequence. Note that the asterisk could be anything such as np.nan.
How would I extend above code to allow for the wildcards?