Pandas: fill one column with count of # of obs between occurrences in a 2nd column

Question

Say I have the following DataFrame which has a 0/1 entry depending on whether something happened/didn't happen within a certain month.

Y = [0,0,1,1,0,0,0,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))

df = pd.DataFrame({'R': Y},index = X)



            R
2010-01-01  0
2010-02-01  0
2010-03-01  1
2010-04-01  1
2010-05-01  0
2010-06-01  0
2010-07-01  0
2010-08-01  0
2010-09-01  1
2010-10-01  1
2010-11-01  1

What I want is to create a 2nd column that lists the # of months until the next occurrence of a 1.

That is, I need:

            R  F
2010-01-01  0  2
2010-02-01  0  1
2010-03-01  1  0
2010-04-01  1  0
2010-05-01  0  4
2010-06-01  0  3
2010-07-01  0  2
2010-08-01  0  1
2010-09-01  1  0
2010-10-01  1  0
2010-11-01  1  0

What I've tried: I haven't gotten far, but I'm able to fill the first bit

A = list(df.index)
T = df[df['R']==1]

a = df.index[0]
b = T.index[0]
c = A.index(b) - A.index(a)

df.loc[a:b, 'F'] = np.linspace(c,0,c+1)

            R    F
2010-01-01  0  2.0
2010-02-01  0  1.0
2010-03-01  1  0.0
2010-04-01  1  NaN
2010-05-01  0  NaN
2010-06-01  0  NaN
2010-07-01  0  NaN
2010-08-01  0  NaN
2010-09-01  1  NaN
2010-10-01  1  NaN
2010-11-01  1  NaN

EDIT Probably would have been better to provide an original example that spanned multiple years.

Y = [0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))

df = pd.DataFrame({'R': Y},index = X)

score 6 · Accepted Answer · answered Aug 09 '19 at 14:22

Here is my way

s=df.R.cumsum()
df.loc[df.R==0,'F']=s.groupby(s).cumcount(ascending=False)+1
df.F.fillna(0,inplace=True)

df
Out[12]: 
            R    F
2010-01-01  0  2.0
2010-02-01  0  1.0
2010-03-01  1  0.0
2010-04-01  1  0.0
2010-05-01  0  4.0
2010-06-01  0  3.0
2010-07-01  0  2.0
2010-08-01  0  1.0
2010-09-01  1  0.0
2010-10-01  1  0.0
2010-11-01  1  0.0

user3483203 · Answer 2 · 2019-08-09T14:28:15.697

Create a series containing your dates, mask this series when your R series is not equal to 1, bfill, and subtract!

u = df.index.to_series()

ii = u.where(df.R.eq(1)).bfill()

12 * (ii.dt.year - u.dt.year) + (ii.dt.month - u.dt.month)

2010-01-01    2
2010-02-01    1
2010-03-01    0
2010-04-01    0
2010-05-01    4
2010-06-01    3
2010-07-01    2
2010-08-01    1
2010-09-01    0
2010-10-01    0
2010-11-01    0
Freq: MS, dtype: int64

score 2 · Answer 3 · answered Aug 09 '19 at 14:13

Here is a way that worked for me, not as elegant as @user3483203 but it does the job.

df['F'] = 0 
for i in df.index: 
     j = i 
     while df.loc[j, 'R'] == 0: 
         df.loc[i, 'F'] =df.loc[i, 'F'] + 1 
         j=j+1                                                                                                                      
df

################
Out[39]: 
        index  R  F
0  2010-01-01  0  2
1  2010-02-01  0  1
2  2010-03-01  1  0
3  2010-04-01  1  0
4  2010-05-01  0  4
5  2010-06-01  0  3
6  2010-07-01  0  2
7  2010-08-01  0  1
8  2010-09-01  1  0
9  2010-10-01  1  0
10 2010-11-01  1  0

In [40]:

This actually works if the DataFrame spans multiple years – measure_theory Aug 09 '19 at 14:19 — measure_theory, Aug 09 '19 at 14:19

score 1 · Answer 4 · answered Aug 09 '19 at 14:37

My take

s = (df.R.diff().ne(0) | df.R.eq(1)).cumsum()
s.groupby(s).transform(lambda s: np.arange(len(s),0,-1) if len(s)>1 else 0)

2010-01-01    2
2010-02-01    1
2010-03-01    0
2010-04-01    0
2010-05-01    4
2010-06-01    3
2010-07-01    2
2010-08-01    1
2010-09-01    0
2010-10-01    0
2010-11-01    0
Freq: MS, Name: R, dtype: int64

Pandas: fill one column with count of # of obs between occurrences in a 2nd column

4 Answers4

Linked