How to get the index of a list items in another list?

Question

Consider I have these lists:

l = [5,6,7,8,9,10,5,15,20]
m = [10,5]

I want to get the index of m in l. I used list comprehension to do that:

[(i,i+1) for i,j in enumerate(l) if m[0] == l[i] and m[1] == l[i+1]]

Output : [(5,6)]

But if I have more numbers in m, I feel its not the right way. So is there any easy approach in Python or with NumPy?

Another example:

l = [5,6,7,8,9,10,5,15,20,50,16,18]
m = [10,5,15,20]

The output should be:

[(5,6,7,8)]

What's the index value supposed to be where it occurs more than once? `5` in your example... — Jon Clements, Aug 22 '17 at 11:22
I want it to check the list in particular order. If the order is not found its ohk to have a an empty list. — Bharath M Shetty, Aug 22 '17 at 11:24
That's an interesting statement. However, what about the answer to the question? — Jon Clements, Aug 22 '17 at 11:24
I'm not 100% sure if the duplicate will solve your problem. Let me know if not. :) — MSeifert, Aug 22 '17 at 11:27
So you want to find the indices of a subsequence in a sequence? — Jon Clements, Aug 22 '17 at 11:31
@Bharathshetty maybe not of the linked one but it's definitely a duplicate of another — Jon Clements, Aug 22 '17 at 11:32
Might be a duplicate of https://stackoverflow.com/questions/10459493/find-indexes-of-sequence-in-list-in-python — Bharath M Shetty, Aug 22 '17 at 12:19
Being NumPy tagged, you might want to avail the vectorized capabilities of it. Re-opened. — Divakar, Aug 22 '17 at 13:14
Are your `l` and `m` lists or NumPy arrays? I realize that you've presented them as lists but that may just be convenience. — MSeifert, Aug 22 '17 at 14:08
`m` has unique numbers? if not what it is the expected behavior? — kederrac, Feb 07 '20 at 14:32

MSeifert · Answer 1 · 2017-08-22T14:12:35.573

The easiest way (using pure Python) would be to iterate over the items and first only check if the first item matches. This avoids doing sublist comparisons when not needed. Depending on the contents of your l this could outperform even NumPy broadcasting solutions:

def func(haystack, needle):  # obviously needs a better name ...
    if not needle:
        return
    # just optimization
    lengthneedle = len(needle)
    firstneedle = needle[0]
    for idx, item in enumerate(haystack):
        if item == firstneedle:
            if haystack[idx:idx+lengthneedle] == needle:
                yield tuple(range(idx, idx+lengthneedle))

>>> list(func(l, m))
[(5, 6, 7, 8)]

In case your interested in speed I checked the performance of the approaches (borrowing from my setup here):

import random
import numpy as np

# strided_app is from https://stackoverflow.com/a/40085052/
def strided_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def pattern_index_broadcasting(all_data, search_data):
    n = len(search_data)
    all_data = np.asarray(all_data)
    all_data_2D = strided_app(np.asarray(all_data), n, S=1)
    return np.flatnonzero((all_data_2D == search_data).all(1))

# view1D is from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def pattern_index_view1D(all_data, search_data):
    a = strided_app(np.asarray(all_data), L=len(search_data), S=1)
    a0v, b0v = view1D(np.asarray(a), np.asarray(search_data))
    return np.flatnonzero(np.in1d(a0v, b0v))

def find_sublist_indices(haystack, needle):
    if not needle:
        return
    # just optimization
    lengthneedle = len(needle)
    firstneedle = needle[0]
    restneedle = needle[1:]
    for idx, item in enumerate(haystack):
        if item == firstneedle:
            if haystack[idx+1:idx+lengthneedle] == restneedle:
                yield tuple(range(idx, idx+lengthneedle))

def Divakar1(l, m):
    return np.squeeze(pattern_index_broadcasting(l, m)[:,None] + np.arange(len(m)))

def Divakar2(l, m):
    return np.squeeze(pattern_index_view1D(l, m)[:,None] + np.arange(len(m)))

def MSeifert(l, m):
    return list(find_sublist_indices(l, m))

# Timing setup
timings = {Divakar1: [], Divakar2: [], MSeifert: []}
sizes = [2**i for i in range(5, 20, 2)]

# Timing
for size in sizes:
    l = [random.randint(0, 50) for _ in range(size)]
    m = [random.randint(0, 50) for _ in range(10)]
    larr = np.asarray(l)
    marr = np.asarray(m)
    for func in timings:
        # first timings:
        # res = %timeit -o func(l, m)
        # second timings:
        if func is MSeifert:
            res = %timeit -o func(l, m)   
        else:
            res = %timeit -o func(larr, marr) 
        timings[func].append(res)

%matplotlib notebook

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(1)
ax = plt.subplot(111)

for func in timings:
    ax.plot(sizes, 
            [time.best for time in timings[func]], 
            label=str(func.__name__))
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time [seconds]')
ax.grid(which='both')
ax.legend()
plt.tight_layout()

In case your l and m are lists my function outperforms the NumPy solutions for all sizes:

But in case you have these as numpy arrays you'll get faster results for large arrays (size > 1000 elements) when using Divakars NumPy solutions:

Benchmark graphs are really nice. And thanks a lot for that link. — Bharath M Shetty, Aug 22 '17 at 14:13
I sort of confirmed it at my end, but that threshold of vectorized solution(s) coming better off is occurring at 200 elems at my end with arrays data. I had to use `list(find_sublist_indices(a, b))`, just to make sure we are doing the same thing. — Divakar, Aug 22 '17 at 14:46
@Divakar You're right. I mixed fastest case for my function (lists) with the fastest case for your functions (arrays). Note that if you want to convert arrays to lists then `tolist` will be a lot faster than `list` (in most cases a factor of 2 - 5). If you're generally interested in a numpy array for all functions benchmark I'll update the answer later. :) — MSeifert, Aug 22 '17 at 14:52

Divakar · Accepted Answer · 2020-02-06T20:54:10.390

You are basically looking for the starting indices of a list in another list.

Approach #1 : One approach to solve it would be to create sliding windows of the elements in list in which we are searching, giving us a 2D array and then simply use NumPy broadcasting to perform broadcasted comparison against the search list against each row of the 2D sliding window version obtained earlier. Thus, one method would be -

# strided_app is from https://stackoverflow.com/a/40085052/
def strided_app(a, L, S ):  # Window len = L, Stride len/stepsize = S
    nrows = ((a.size-L)//S)+1
    n = a.strides[0]
    return np.lib.stride_tricks.as_strided(a, shape=(nrows,L), strides=(S*n,n))

def pattern_index_broadcasting(all_data, search_data):
    n = len(search_data)
    all_data = np.asarray(all_data)
    all_data_2D = strided_app(np.asarray(all_data), n, S=1)
    return np.flatnonzero((all_data_2D == search_data).all(1))

out = np.squeeze(pattern_index_broadcasting(l, m)[:,None] + np.arange(len(m)))

Sample runs -

In [340]: l = [5,6,7,8,9,10,5,15,20,50,16,18]
     ...: m = [10,5,15,20]
     ...: 

In [341]: np.squeeze(pattern_index_broadcasting(l, m)[:,None] + np.arange(len(m)))
Out[341]: array([5, 6, 7, 8])

In [342]: l = [5,6,7,8,9,10,5,15,20,50,16,18,10,5,15,20]
     ...: m = [10,5,15,20]
     ...: 

In [343]: np.squeeze(pattern_index_broadcasting(l, m)[:,None] + np.arange(len(m)))
Out[343]: 
array([[ 5,  6,  7,  8],
       [12, 13, 14, 15]])

Approach #2 : Another method would be to get the sliding window and then get the row-wise scalar view into the data to be search data and the data to be search for, giving us 1D data to work with, like so -

# view1D is from https://stackoverflow.com/a/45313353/
def view1D(a, b): # a, b are arrays
    a = np.ascontiguousarray(a)
    void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
    return a.view(void_dt).ravel(),  b.view(void_dt).ravel()

def pattern_index_view1D(all_data, search_data):
    a = strided_app(np.asarray(all_data), L=len(search_data), S=1)
    a0v, b0v = view1D(np.asarray(a), np.asarray(search_data))
    return np.flatnonzero(np.in1d(a0v, b0v)) 

out = np.squeeze(pattern_index_view1D(l, m)[:,None] + np.arange(len(m)))

2020 Versions

In search of more easy/compact approaches, we could look into scikit-image's view_as_windows for getting sliding windows with a built-in. I am assuming arrays as inputs for less messy code. For lists as input, we have to use np.asarray() as shown earlier.

Approach #3 : Basically a derivative of pattern_index_broadcasting with view_as_windows for a one-liner with a as the larger data and b is the array to be searched -

from skimage.util import view_as_windows

np.flatnonzero((view_as_windows(a,len(b))==b).all(1))[:,None]+np.arange(len(b))

Approach #4 : For a small number of matches from b in a, we could optimize, by looking for first element match from b to reduce the dataset size for searches -

mask = a[:-len(b)+1]==b[0]
mask[mask] = (view_as_windows(a,len(b))[mask]).all(1)
out = np.flatnonzero(mask)[:,None]+np.arange(len(b))

Approach #5 : For a small sized b, we could simply run a loop for each of the elements in b and perform bitwise and-reduction -

mask = np.bitwise_and.reduce([a[i:len(a)-len(b)+1+i]==b[i] for i in range(len(b))])
out = np.flatnonzero(mask)[:,None]+np.arange(len(b))

Thats so precise and fast. It gave output in nano seconds. Wow. — Bharath M Shetty, Aug 22 '17 at 13:29
Sir if I want to get the all the repetition sequence than the first one. — Bharath M Shetty, Aug 22 '17 at 13:43
@Bharathshetty Replace `np.argmax` with `np.flatnonzero`. Let me edit. — Divakar, Aug 22 '17 at 13:44
@Bharathshetty As I said, let me edit :) Let me know when you are done editing :) — Divakar, Aug 22 '17 at 13:51
@Bharathshetty I think the edits looks good now. Would be interesting to see if approach #2 is performing any better at your end. Thanks! — Divakar, Aug 22 '17 at 13:54
Both of them works fine sir. First one is bit faster than the second approach. — Bharath M Shetty, Aug 22 '17 at 13:56

score 3 · Answer 3 · answered Feb 06 '20 at 12:06

Just making the point that @MSeifert's approach can, of course, also be implemented in numpy:

def pp(h,n):
    nn = len(n)
    NN = len(h)
    c = (h[:NN-nn+1]==n[0]).nonzero()[0]
    if c.size==0: return
    for i,l in enumerate(n[1:].tolist(),1):
        c = c[h[i:][c]==l]
        if c.size==0: return
    return np.arange(c[0],c[0]+nn)

score 0 · Answer 4 · answered Feb 22 '22 at 13:25

0

def get_data(l1,l2):
    d=defaultdict(list)
    [d[item].append(index) for index,item in enumerate(l1)]
    print(d)

Using defaultdict to store indices of elements from other list.

answered Feb 22 '22 at 13:25

108

119
2
11

How to get the index of a list items in another list?

4 Answers4

2020 Versions