2

Given list_a and list_b. I want to run list_b through a function that gives all possible sublist of list_b (this part of the code works). I then want to take every sublist of list_b, and see if that sublist is also a sublist of list_a. If it is I should get a list of all the indexes, or splices where that sublist appears in list_a.

I'm able to get the code to work for sublist of length one, but cannot get it to work for longer lists.

Here's my current code to solve this problem:

import numpy as np

a = [0,1,2,3,0,2,3]
b = [0,2,3]

sublists = []
def all_sublists(my_list):  
    """ make a list containg every sublist of a my_list"""
    for i in range(len(my_list)):
        n = i+1
        while n <= len(my_list):
            sublist = my_list[i:n]
            sublists.append(sublist)
            n += 1

def sublists_splice(sublist, my_list):
    """if sublist is in my_list print sublist and the corresponding indexes"""
    values = np.array(my_list)
    print(str(sublist) + " found at " + str(np.where(values == sublist)[0]))

all_sublists(b)
for sublist in sublists:
    sublists_splice(sublist, a)

This is the output of the code:

[0] found at [0 4]
[0, 2] found at []
[0, 2, 3] found at []
[2] found at [2 5]
[2, 3] found at []
[3] found at [3 6]
/home/nicholas/Desktop/sublists.py:27: DeprecationWarning: elementwise == comparison failed; this will raise an error in the future.

Here's what I'd like to get:

[0] found at [0 4]
[0, 2] found at [4:6]
[0, 2, 3] found at [4:7]
[2] found at [2 5]
[2, 3] found at [2:4 5:7]
[3] found at [3 6]

I'm assuming there's a pythonic way to approach this. While I've tried a few bits of code they've all been very long and haven't worked...

One last note. I do need them to be sublists not subsets as order matters.

I appreciate any help. Thank you.

jpp
  • 159,742
  • 34
  • 281
  • 339
Osuynonma
  • 489
  • 1
  • 7
  • 13

2 Answers2

2

Using tools from Find boolean mask by pattern

def rolling_window(a, window):  #https://stackoverflow.com/q/7100242/2901002
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    c = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
    return c

def vview(a):  #based on @jaime's answer: https://stackoverflow.com/a/16973510/4427777
    return np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))

def sublist_loc(a, b):
    a, b = np.array(a), np.array(b)
    n = min(len(b), len(a))   
    sublists = [rolling_window(b, i) for i in range(1, n + 1)]
    target_lists = [rolling_window(a, i) for i in range(1, n + 1)]
    locs = [[np.flatnonzero(vview(target_lists[i]) == s) for s in vview(subl)] \
        for i, subl in enumerate(sublists)]
    for i in range (n):
        for j, k in enumerate(sublists[i]):
            print(str(k) + " found starting at index " + str(locs[i][j]))
    return sublists, target_lists, locs


_ = sublist_loc(a, b)
[0] found starting at index [0 4]
[2] found starting at index [2 5]
[3] found starting at index [3 6]
[0 2] found starting at index [4]
[2 3] found starting at index [2 5]
[0 2 3] found starting at index [4]

As an added benefit, all the rolling_window and vview calls are just to views of the original arrays, so there is no big memory hit to storing the combinations.

Daniel F
  • 13,620
  • 2
  • 29
  • 55
1

This is one solution using itertools.combinations. Note I have made it as lazy as possible, but that does not mean it is the most efficient solution available.

from itertools import combinations
import numpy as np

a = [0,1,2,3,0,2,3]
b = [0,2,3]

def get_combs(x):
    return (list(c) for i in range(1, len(x)) for c in combinations(x, i))

def get_index_arr(x, y):
    n = len(x)
    lst = (y[i:i+n] for i in range(len(y)-len(x)+1))
    return (i for i, j in enumerate(lst) if x == j)

combs = get_combs(b)

d = {tuple(c): list(get_index_arr(c, a)) for c in combs}

# {(0,): [0, 4],
#  (0, 2): [4],
#  (0, 3): [],
#  (2,): [2, 5],
#  (2, 3): [2, 5],
#  (3,): [3, 6]}
jpp
  • 159,742
  • 34
  • 281
  • 339