1

I have a dataframe, which has 2 columns,

    a  b
0   1  2
1   1  1
2   1  1
3   1  2
4   1  1
5   2  0
6   2  1
7   2  1
8   2  2
9   2  2
10  2  1
11  2  1
12  2  2

Is there a direct way to make a third column as below

    a  b  c
0   1  2  0
1   1  1  1
2   1  1  0
3   1  2  1
4   1  1  0
5   2  0  0
6   2  1  1
7   2  1  0
8   2  2  1
9   2  2  0
10  2  1  0
11  2  1  0
12  2  2  0

in which target [1, 2] is a sublist of df.groupby('a').b.apply(list), find the 2 rows that firstly fit the target in every group.


df.groupby('a').b.apply(list) gives

1             [2, 1, 1, 2, 1]
2    [0, 1, 1, 2, 2, 1, 1, 2]

[1,2] is a sublist of [2, 1, 1, 2, 1] and [0, 1, 1, 2, 2, 1, 1, 2]


so far, I have a function

def is_sub_with_gap(sub, lst):
    '''
    check if sub is a sublist of lst
    '''
    ln, j = len(sub), 0
    ans = []
    for i, ele in enumerate(lst):
        if ele == sub[j]:
            j += 1
            ans.append(i)
            
        if j == ln:
            return True, ans
    return False, []

test on the function

In [55]: is_sub_with_gap([1,2], [2, 1, 1, 2, 1])
Out[55]: (True, [1, 3])
ComplicatedPhenomenon
  • 4,055
  • 2
  • 18
  • 45
  • 1
    Can you explain `[1, 2] is a sublist of df.groupby('a').b.apply(list)` ? – jezrael Sep 16 '20 at 06:46
  • For me is not clear `[1,2] is a sublist of [2, 1, 1, 2, 1] and [0, 1, 1, 2, 2, 1, 1, 2]` and why is set `1` to new column? – jezrael Sep 16 '20 at 06:54
  • 1
    @jezrael `[1,2]` is a sublist of `[2, 1, 1, 2, 1]`, I can find row index `[1, 3]`, [`1,2] ` is sublist of `[0, 1, 1, 2, 2, 1, 1, 2]`, I can find row index `[6, 8]`, I make these rows with 1 to the new column. – ComplicatedPhenomenon Sep 16 '20 at 07:01
  • Why `[6,8]` ? And not `[0, 1, (1, 2), (2, 1), 1, 2]` - 3 and 5 ? – jezrael Sep 16 '20 at 07:05
  • @jezrael https://stackoverflow.com/questions/34599113/how-to-find-if-a-list-is-a-subset-of-another-list-in-order, my definition of sublist can be found here. – ComplicatedPhenomenon Sep 16 '20 at 07:07
  • Still not understand, I test `print (is_sub([2, 1, 1, 2, 1], [1,2])) print (is_sub([0, 1, 1, 2, 2, 1, 1, 2], [1,2]))` and both return `False` – jezrael Sep 16 '20 at 07:14
  • @jezrael why not `print (is_sub([1,2], [2, 1, 1, 2, 1])) print (is_sub([1,2], [0, 1, 1, 2, 2, 1, 1, 2]))`? – ComplicatedPhenomenon Sep 16 '20 at 07:18

1 Answers1

1

You can change output by select index values of groups in custom function, flatten it by Series.explode and then test index values by Index.isin:

L = [1, 2]

def is_sub_with_gap(sub, lst):
    '''
    check of sub is a sublist of lst
    '''
    ln, j = len(sub), 0
    ans = []
    for i, ele in enumerate(lst):
        if ele == sub[j]:
            j += 1
            ans.append(i)
            
        if j == ln:
            return lst.index[ans]
    return []

idx = df.groupby('a').b.apply(lambda x: is_sub_with_gap(L, x)).explode()

df['c'] = df.index.isin(idx).view('i1')
print (df)
    a  b  c
0   1  2  0
1   1  1  1
2   1  1  0
3   1  2  1
4   1  1  0
5   2  0  0
6   2  1  1
7   2  1  0
8   2  2  1
9   2  2  0
10  2  1  0
11  2  1  0
12  2  2  0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252