9

Say I have the following DataFrame which has a 0/1 entry depending on whether something happened/didn't happen within a certain month.

Y = [0,0,1,1,0,0,0,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))

df = pd.DataFrame({'R': Y},index = X)



            R
2010-01-01  0
2010-02-01  0
2010-03-01  1
2010-04-01  1
2010-05-01  0
2010-06-01  0
2010-07-01  0
2010-08-01  0
2010-09-01  1
2010-10-01  1
2010-11-01  1

What I want is to create a 2nd column that lists the # of months until the next occurrence of a 1.

That is, I need:

            R  F
2010-01-01  0  2
2010-02-01  0  1
2010-03-01  1  0
2010-04-01  1  0
2010-05-01  0  4
2010-06-01  0  3
2010-07-01  0  2
2010-08-01  0  1
2010-09-01  1  0
2010-10-01  1  0
2010-11-01  1  0

What I've tried: I haven't gotten far, but I'm able to fill the first bit

A = list(df.index)
T = df[df['R']==1]

a = df.index[0]
b = T.index[0]
c = A.index(b) - A.index(a)

df.loc[a:b, 'F'] = np.linspace(c,0,c+1)

            R    F
2010-01-01  0  2.0
2010-02-01  0  1.0
2010-03-01  1  0.0
2010-04-01  1  NaN
2010-05-01  0  NaN
2010-06-01  0  NaN
2010-07-01  0  NaN
2010-08-01  0  NaN
2010-09-01  1  NaN
2010-10-01  1  NaN
2010-11-01  1  NaN

EDIT Probably would have been better to provide an original example that spanned multiple years.

Y = [0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))

df = pd.DataFrame({'R': Y},index = X)
measure_theory
  • 874
  • 10
  • 28

4 Answers4

6

Here is my way

s=df.R.cumsum()
df.loc[df.R==0,'F']=s.groupby(s).cumcount(ascending=False)+1
df.F.fillna(0,inplace=True)

df
Out[12]: 
            R    F
2010-01-01  0  2.0
2010-02-01  0  1.0
2010-03-01  1  0.0
2010-04-01  1  0.0
2010-05-01  0  4.0
2010-06-01  0  3.0
2010-07-01  0  2.0
2010-08-01  0  1.0
2010-09-01  1  0.0
2010-10-01  1  0.0
2010-11-01  1  0.0
BENY
  • 317,841
  • 20
  • 164
  • 234
4

Create a series containing your dates, mask this series when your R series is not equal to 1, bfill, and subtract!


u = df.index.to_series()

ii = u.where(df.R.eq(1)).bfill()

12 * (ii.dt.year - u.dt.year) + (ii.dt.month - u.dt.month)

2010-01-01    2
2010-02-01    1
2010-03-01    0
2010-04-01    0
2010-05-01    4
2010-06-01    3
2010-07-01    2
2010-08-01    1
2010-09-01    0
2010-10-01    0
2010-11-01    0
Freq: MS, dtype: int64
user3483203
  • 50,081
  • 9
  • 65
  • 94
2

Here is a way that worked for me, not as elegant as @user3483203 but it does the job.

df['F'] = 0 
for i in df.index: 
     j = i 
     while df.loc[j, 'R'] == 0: 
         df.loc[i, 'F'] =df.loc[i, 'F'] + 1 
         j=j+1                                                                                                                      
df  
################
Out[39]: 
        index  R  F
0  2010-01-01  0  2
1  2010-02-01  0  1
2  2010-03-01  1  0
3  2010-04-01  1  0
4  2010-05-01  0  4
5  2010-06-01  0  3
6  2010-07-01  0  2
7  2010-08-01  0  1
8  2010-09-01  1  0
9  2010-10-01  1  0
10 2010-11-01  1  0

In [40]:   
nidabdella
  • 811
  • 8
  • 24
1

My take

s = (df.R.diff().ne(0) | df.R.eq(1)).cumsum()
s.groupby(s).transform(lambda s: np.arange(len(s),0,-1) if len(s)>1 else 0)

2010-01-01    2
2010-02-01    1
2010-03-01    0
2010-04-01    0
2010-05-01    4
2010-06-01    3
2010-07-01    2
2010-08-01    1
2010-09-01    0
2010-10-01    0
2010-11-01    0
Freq: MS, Name: R, dtype: int64
rafaelc
  • 57,686
  • 15
  • 58
  • 82