1

I have the following dataframe:

date group value1 value2
2020-01-01 A 12 2
2020-02-01 A 11 4
2020-03-01 A 7 5
2020-04-01 A 23 3
2020-03-01 B 66 1
2020-04-01 B 61 8
2020-06-01 B 55 8
2020-04-01 C 3 0
2020-06-01 C 22 112
... ... ... ...

I am able to fill the missing dates in between dates per group:

df.set_index('date').groupby('group').resample('MS').asfreq().drop(['group'], 1).reset_index()

But I want to fill all the dates per group for a specific time range (e.g. 2020-01-01 to 2020-06-01).

So the result would look something like that:

date group value1 value2
2020-01-01 A 12 2
2020-02-01 A 11 4
2020-03-01 A 7 5
2020-04-01 A 23 3
2020-05-01 A 0 0
2020-06-01 A 0 0
2020-01-01 B 0 0
2020-02-01 B 0 0
2020-03-01 B 66 1
2020-04-01 B 61 8
2020-05-01 B 0 0
2020-06-01 B 55 8
... ... ... ...
SqHu
  • 99
  • 1
  • 5

1 Answers1

1

One option is to follow this similar answer and create a function and apply it to each group. You can set a vector of dates to use for reindexing, and put that into a function which will be applied to each group:

START = '01-01-2020'
END = '06-01-2020'
DATE_RANGE = pd.date_range(START, END, freq='MS')

def apply_reindex(df):
    reindexed = df.set_index('date').reindex(DATE_RANGE)
    filled = reindexed.fillna({'group':df.name, 'value1':0, 'value2':0})
    filled.index.name = 'date'
    filled = filled.reset_index()
    return filled

There's some annoying index setting/renaming stuff to be able to call reindex on the date column and then return it to be a column.

You can then call the function with:

df.groupby('group').apply(apply_reindex).reset_index(drop=True)

Which produces:

        date group  value1  value2
0  2020-01-01     A    12.0     2.0
1  2020-02-01     A    11.0     4.0
2  2020-03-01     A     7.0     5.0
3  2020-04-01     A    23.0     3.0
4  2020-05-01     A     0.0     0.0
5  2020-06-01     A     0.0     0.0
6  2020-01-01     B     0.0     0.0
7  2020-02-01     B     0.0     0.0
8  2020-03-01     B    66.0     1.0
9  2020-04-01     B    61.0     8.0
10 2020-05-01     B     0.0     0.0
11 2020-06-01     B    55.0     8.0
12 2020-01-01     C     0.0     0.0
13 2020-02-01     C     0.0     0.0
14 2020-03-01     C     0.0     0.0
15 2020-04-01     C     3.0     0.0
16 2020-05-01     C     0.0     0.0
17 2020-06-01     C    22.0   112.0
Tom
  • 8,310
  • 2
  • 16
  • 36
  • This works great - thank you. Can you tell me what the df.name in: "filled = reindexed.fillna({'group':df.name})" does? – SqHu Nov 18 '21 at 12:52
  • 1
    @SqHu Sure! Within the `apply` of a function during a `groupby` operation, the DataFrame that is passed to the function will have its `name` attribute set according to the group it is part of. So since I am grouping by the `group` column, three DataFrames are passed to the function, and they either have a `.name` of `'A'`, `'B'`, or `'C'`. I basically just use that to fill any missing values produced by the reindexing for the `group` column. – Tom Nov 18 '21 at 15:54