I've been working on this for sometime now and can't seem to work it out. I'm working with a multi-indexed dataframe with 2 levels which look like the following:
def data():
data = {'date': pd.Series(['2016-1-1', '2016-1-1', '2016-1-1',
'2016-1-2', '2016-1-2', '2016-1-2',
'2016-1-3', '2016-1-3', '2016-1-3',
'2016-1-4', '2016-1-4', '2016-1-4',
'2016-1-5', '2016-1-5', '2016-1-5']),
'group': pd.Series(['groupA', 'groupB', 'groupC',
'groupA', 'groupB', 'groupC',
'groupA', 'groupB', 'groupC',
'groupA', 'groupB', 'groupC',
'groupA', 'groupB', 'groupC']),
'series1': pd.Series([1, 2, 3,
1, 2, 3,
1, 2, 3,
1, 3, 4,
2, 3, 4]),
'series2': pd.Series([1, 3, 4,
2, 3, 3,
2, 4, 2,
1, 2, 3,
1, 2, 3])}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df.set_index(['date', 'group'], inplace=True)
return df
I have a column that specifies 1 of 3 conditions. There may be a more succinct way to write this portion of the code, but this is not my issue.
def add_cond(df):
df['1minus2'] = df['series1'] - df['series2']
# first condition
mask1 = df['series1'] < df['series2']
df.loc[mask1, 'result'] = 'less'
# second condition
mask2 = df['series1'] > df['series2']
df.loc[mask2, 'result'] = 'greater'
# third condition
mask3 = df['series1'] == df['series2']
df.loc[mask3, 'result'] = 'equal'
return df
My issue is that I want to add a column which counts the number of daily consecutive conditions. I've tried several different implementations of groupby
and cumcount
and I'm able to get the cumulative counts of all conditions but I want them to reset when the date index is not consecutive.
Below I've listed some related posts that I've tried. I thought the second answer in Pandas: conditional rolling count would work but it uses a transform
method which doesn't seem to work here since I have multiple columns.
Following the strategy outlined in the post Finding consecutive segments in a pandas data frame I created the code below which creates a Series
of numpy arrays
containing the date and group index values and data from the "result" column. I'm thinking I can slice this df in such a way that will allow me to count each consecutive group and merge those results back into the original df.
df1 = df.reset_index(level=['date','group']).groupby(['result']).apply(np.array)
The output would look like this.
1
1
1
1
2
1
2
3
1
1
1
2
1
2
3
It's a bit hard to see whether the cumulative consecutive conditions are being met with the hierarchical df structure but if I unstack the df it's much easier to see what I'm trying to achieve. Maybe there's a way to use unstack
to orient the data in such a way that will give me the result I'm looking for?
df['result'].groupby(['date','group']).head().unstack()
Out[9]:
group groupA groupB groupC
date
2016-01-01 equal less less
2016-01-02 less less equal
2016-01-03 less less greater
2016-01-04 equal greater greater
2016-01-05 greater greater greater