I have a pandas.DataFrame
df
with a pandas.DatetimeIndex
and a column named group_column
.
I need the df
to have a minutely frequency (meaning there is a row for every minute).
however this needs to be case for every value in the group_column
, so every minute can potentially have several values.
NOTE:
- the
group_column
can have hundreds of unique values. - some groups can "last" several minutes and others can last for days, the edges are determined by the first and last appearances of the values in
group_column
.
example
Input:
dates = [pd.Timestamp('2018-01-01 12:00'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:01'), pd.Timestamp('2018-01-01 12:03'), pd.Timestamp('2018-01-01 12:04')]
df = pd.DataFrame({'group_column': ['a', 'a','b','a','b'], 'data_column': [1.2, 2.2, 4, 1, 2]}, index=dates)
group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:01:00 b 4.0
2018-01-01 12:03:00 a 1.0
2018-01-01 12:04:00 b 2.0
desired output:
group_column data_column
2018-01-01 12:00:00 a 1.2
2018-01-01 12:01:00 a 2.2
2018-01-01 12:02:00 a 2.2
2018-01-01 12:03:00 a 1.0
2018-01-01 12:01:00 b 4.0
2018-01-01 12:02:00 b 4.0
2018-01-01 12:03:00 b 4.0
2018-01-01 12:04:00 b 2.0
my attempt
I have done this, however it seems highly inefficient:
def group_resmaple(df, group_column_name):
values = df[group_column_name].unique()
for value in values:
df_g = df.loc[df[group_column]==value]
df_g = df_g.asfreq('min', 'pad')
yield df_g
df_paded = pd.concat(group_resmaple(df, 'group_column'))