How to resample a Time Series for a specific timerange for each group in a dataframe?

Question

I have the following dataframe:

date	group	value1	value2
2020-01-01	A	12	2
2020-02-01	A	11	4
2020-03-01	A	7	5
2020-04-01	A	23	3
2020-03-01	B	66	1
2020-04-01	B	61	8
2020-06-01	B	55	8
2020-04-01	C	3	0
2020-06-01	C	22	112
...	...	...	...

I am able to fill the missing dates in between dates per group:

df.set_index('date').groupby('group').resample('MS').asfreq().drop(['group'], 1).reset_index()

But I want to fill all the dates per group for a specific time range (e.g. 2020-01-01 to 2020-06-01).

So the result would look something like that:

date	group	value1	value2
2020-01-01	A	12	2
2020-02-01	A	11	4
2020-03-01	A	7	5
2020-04-01	A	23	3
2020-05-01	A	0	0
2020-06-01	A	0	0
2020-01-01	B	0	0
2020-02-01	B	0	0
2020-03-01	B	66	1
2020-04-01	B	61	8
2020-05-01	B	0	0
2020-06-01	B	55	8
...	...	...	...

As a start, part of your answer will likely involve `.fillna(0)` if you want `0` instead of `NaN` — Joshua Voskamp, Nov 11 '21 at 14:24

score 1 · Accepted Answer · answered Nov 11 '21 at 14:52

One option is to follow this similar answer and create a function and apply it to each group. You can set a vector of dates to use for reindexing, and put that into a function which will be applied to each group:

START = '01-01-2020'
END = '06-01-2020'
DATE_RANGE = pd.date_range(START, END, freq='MS')

def apply_reindex(df):
    reindexed = df.set_index('date').reindex(DATE_RANGE)
    filled = reindexed.fillna({'group':df.name, 'value1':0, 'value2':0})
    filled.index.name = 'date'
    filled = filled.reset_index()
    return filled

There's some annoying index setting/renaming stuff to be able to call reindex on the date column and then return it to be a column.

You can then call the function with:

df.groupby('group').apply(apply_reindex).reset_index(drop=True)

Which produces:

        date group  value1  value2
0  2020-01-01     A    12.0     2.0
1  2020-02-01     A    11.0     4.0
2  2020-03-01     A     7.0     5.0
3  2020-04-01     A    23.0     3.0
4  2020-05-01     A     0.0     0.0
5  2020-06-01     A     0.0     0.0
6  2020-01-01     B     0.0     0.0
7  2020-02-01     B     0.0     0.0
8  2020-03-01     B    66.0     1.0
9  2020-04-01     B    61.0     8.0
10 2020-05-01     B     0.0     0.0
11 2020-06-01     B    55.0     8.0
12 2020-01-01     C     0.0     0.0
13 2020-02-01     C     0.0     0.0
14 2020-03-01     C     0.0     0.0
15 2020-04-01     C     3.0     0.0
16 2020-05-01     C     0.0     0.0
17 2020-06-01     C    22.0   112.0

This works great - thank you. Can you tell me what the df.name in: "filled = reindexed.fillna({'group':df.name})" does? — SqHu, Nov 18 '21 at 12:52
@SqHu Sure! Within the `apply` of a function during a `groupby` operation, the DataFrame that is passed to the function will have its `name` attribute set according to the group it is part of. So since I am grouping by the `group` column, three DataFrames are passed to the function, and they either have a `.name` of `'A'`, `'B'`, or `'C'`. I basically just use that to fill any missing values produced by the reindexing for the `group` column. — Tom, Nov 18 '21 at 15:54

How to resample a Time Series for a specific timerange for each group in a dataframe?

1 Answers1