a list as a sublist of a list from group into list

Question

I have a dataframe, which has 2 columns,

Is there a direct way to make a third column as below

    a  b  c
0   1  2  0
1   1  1  1
2   1  1  0
3   1  2  1
4   1  1  0
5   2  0  0
6   2  1  1
7   2  1  0
8   2  2  1
9   2  2  0
10  2  1  0
11  2  1  0
12  2  2  0

in which target [1, 2] is a sublist of df.groupby('a').b.apply(list), find the 2 rows that firstly fit the target in every group.

df.groupby('a').b.apply(list) gives

1             [2, 1, 1, 2, 1]
2    [0, 1, 1, 2, 2, 1, 1, 2]

[1,2] is a sublist of [2, 1, 1, 2, 1] and [0, 1, 1, 2, 2, 1, 1, 2]

so far, I have a function

def is_sub_with_gap(sub, lst):
    '''
    check if sub is a sublist of lst
    '''
    ln, j = len(sub), 0
    ans = []
    for i, ele in enumerate(lst):
        if ele == sub[j]:
            j += 1
            ans.append(i)
            
        if j == ln:
            return True, ans
    return False, []

test on the function

In [55]: is_sub_with_gap([1,2], [2, 1, 1, 2, 1])
Out[55]: (True, [1, 3])

Can you explain `[1, 2] is a sublist of df.groupby('a').b.apply(list)` ? — jezrael, Sep 16 '20 at 06:46
For me is not clear `[1,2] is a sublist of [2, 1, 1, 2, 1] and [0, 1, 1, 2, 2, 1, 1, 2]` and why is set `1` to new column? — jezrael, Sep 16 '20 at 06:54
@jezrael `[1,2]` is a sublist of `[2, 1, 1, 2, 1]`, I can find row index `[1, 3]`, [`1,2] ` is sublist of `[0, 1, 1, 2, 2, 1, 1, 2]`, I can find row index `[6, 8]`, I make these rows with 1 to the new column. — ComplicatedPhenomenon, Sep 16 '20 at 07:01
Why `[6,8]` ? And not `[0, 1, (1, 2), (2, 1), 1, 2]` - 3 and 5 ? — jezrael, Sep 16 '20 at 07:05
@jezrael https://stackoverflow.com/questions/34599113/how-to-find-if-a-list-is-a-subset-of-another-list-in-order, my definition of sublist can be found here. — ComplicatedPhenomenon, Sep 16 '20 at 07:07
Still not understand, I test `print (is_sub([2, 1, 1, 2, 1], [1,2])) print (is_sub([0, 1, 1, 2, 2, 1, 1, 2], [1,2]))` and both return `False` — jezrael, Sep 16 '20 at 07:14
@jezrael why not `print (is_sub([1,2], [2, 1, 1, 2, 1])) print (is_sub([1,2], [0, 1, 1, 2, 2, 1, 1, 2]))`? — ComplicatedPhenomenon, Sep 16 '20 at 07:18

jezrael · Accepted Answer · 2020-09-16T07:44:08.677

1

You can change output by select index values of groups in custom function, flatten it by Series.explode and then test index values by Index.isin:

L = [1, 2]

def is_sub_with_gap(sub, lst):
    '''
    check of sub is a sublist of lst
    '''
    ln, j = len(sub), 0
    ans = []
    for i, ele in enumerate(lst):
        if ele == sub[j]:
            j += 1
            ans.append(i)
            
        if j == ln:
            return lst.index[ans]
    return []

idx = df.groupby('a').b.apply(lambda x: is_sub_with_gap(L, x)).explode()

df['c'] = df.index.isin(idx).view('i1')
print (df)
    a  b  c
0   1  2  0
1   1  1  1
2   1  1  0
3   1  2  1
4   1  1  0
5   2  0  0
6   2  1  1
7   2  1  0
8   2  2  1
9   2  2  0
10  2  1  0
11  2  1  0
12  2  2  0

edited Sep 16 '20 at 07:44

answered Sep 16 '20 at 07:34

jezrael

822,522
95
1,334
1,252

`lst.index[ans]`leads to error `'builtin_function_or_method' object is not subscriptable` – ComplicatedPhenomenon Sep 16 '20 at 07:42
Is `is_sub_with_gap(lst, x)` corect order? – jezrael Sep 16 '20 at 07:42
tested with `is_sub_with_gap([1,2], [2, 1, 1, 2, 1])` – ComplicatedPhenomenon Sep 16 '20 at 07:44
@ComplicatedPhenomenon - I think variables in lambda function, changed it for more clear `L = [1, 2]` and `idx = df.groupby('a').b.apply(lambda x: is_sub_with_gap(L, x)).explode()` – jezrael Sep 16 '20 at 07:45
@ComplicatedPhenomenon - Because it seems there is swapped `x` and `L` – jezrael Sep 16 '20 at 07:45
@ComplicatedPhenomenon - Maybe problem is in my solution in functio `lst` is not list, but `Series`, so working well here – jezrael Sep 16 '20 at 07:53
It works the way you showed. Thanks for your efforts – ComplicatedPhenomenon Sep 16 '20 at 08:02
@ComplicatedPhenomenon - Thanjk you for patience ;) Happy coding! – jezrael Sep 16 '20 at 08:02
Theoretically speaking, I should get `df[df['c']==1].groupby('a').b.apply(list)` either as `L` or `[]` right? – ComplicatedPhenomenon Sep 16 '20 at 08:43
yes, then get empty list, so `explode` remove this groups. – jezrael Sep 16 '20 at 08:44

a list as a sublist of a list from group into list

1 Answers1