0

I have a pandas dataframe that looks like this

import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
                    'col_a': [1,1,0,1,0,1,1,1,0,1],
                    'col_b': [0,1,1,0,0,0,1,1,1,0]})

I would like to create 2 columns:

  • col_a_consequent: 1 if col_a has n consequent occurrences of 1s by id
  • col_c: 1 if after 2 consequent occurrences of 1s at col_a there is 1 at col_b

The output dataframe looks like this: for n=3

foo = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
                        'col_a': [1,1,0,1,0,1,1,1,0,1],
                        'col_b': [0,1,1,0,0,0,1,1,1,0],
                        'col_a_consequent': [0,0,0,0,0,1,1,1,0,0],
                        'col_c': [1,1,1,0,0,1,1,1,1,0]})

For col_a_consequent according to this question I can obtain what I want

n = 3

foo_tmp = foo.query('id == 2')

(foo_tmp.col_a.groupby((foo_tmp.col_a != foo_tmp.col_a.shift()).\
                  cumsum()).transform('size') * foo_tmp.col_a >= n).astype(int)

but I dont know how I can do the same operation with groupby for all ids

Any ideas ?

quant
  • 4,062
  • 5
  • 29
  • 70

1 Answers1

2

For the first you could use:

N = 3
foo['col_a_consequent'] = (foo
                           .groupby(['id', foo['col_a'].ne(foo['col_a'].shift()).cumsum()])
                           ['col_a'].transform('size').ge(N).astype(int)
                          )

output:

   id  col_a  col_b  col_a_consequent2  col_a_consequent3
0   1      1      0                  1                  0
1   1      1      1                  1                  0
2   1      0      1                  0                  0
3   1      1      0                  0                  0
4   1      0      0                  0                  0
5   2      1      0                  1                  1
6   2      1      1                  1                  1
7   2      1      1                  1                  1
8   2      0      1                  0                  0
9   2      1      0                  0                  0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Your solution seems to work for `N=3`, but it does not work as expected for `N=2`. For `N=2` the `col_a_consequent` column should have values `[1,1,0,0,0,1,1,1,0,0]` whereas your solution gives as output `[1,1,1,1,0,1,1,1,1,1]` – quant Mar 08 '22 at 12:25
  • 1
    @quant you're right, I updated the answer ;) – mozway Mar 08 '22 at 12:33