Efficient way to fill missing dates by group in pandas?

Question

So, I have a dataframe like this one:

   date       ID   value
2018-01-01    A     10
2018-02-01    A     11
2018-04-01    A     13
2017-08-01    B     20
2017-10-01    B     21
2017-11-01    B     23

Each group can have very different dates, and there's about 400k groups. So, what I want to do is to fill the missing dates of each group in an efficient way, so it looks like this:

   date       ID   value
2018-01-01    A     10
2018-02-01    A     11
2018-03-01    A     nan
2018-04-01    A     13
2017-08-01    B     20
2017-09-01    B     nan
2017-10-01    B     21
2017-11-01    B     23

I've tried two approaches:

df2 = df.groupby('ID').apply(lambda x: x.set_index('date').resample('D').pad())

And also:

df2= df.set_index(['date','ID']).unstack().stack(dropna=False).reset_index()
df2= df2.sort_values(by=['ID','date']).reset_index(drop=True)
df2=  df2[df2.groupby('ID').value.ffill().notna()]
df2 = df2[df2.groupby('ID').value.bfill().notna()]

The first one, as it uses apply, it's very slow. I guess I could use something else instead of pad so I get nan instead of the previous value, but I'm not sure that will impact the perfomance enough. I waited around 15 minutes and it didn't finish running.

The second one fills from the first date in the whole dataframe to the last one, for every group, which brings a massive dataframe. Afterward I drop all leading and trailing nan generated by this method. This is quite faster than the first option, but doesn't seem to be the best one. Is there a better way to do this, that's better for big dataframes?

https://stackoverflow.com/questions/19324453/add-missing-dates-to-pandas-dataframe? — xQbert, Oct 07 '19 at 21:03
When you have year different 2017 and 2018 in each group, what would you like to have ? — BENY, Oct 07 '19 at 21:07
https://stackoverflow.com/questions/44978196/pandas-filling-missing-dates-and-values-within-group missing dates within a group? — xQbert, Oct 07 '19 at 21:08
Possible duplicate of [Add missing dates to pandas dataframe](https://stackoverflow.com/questions/19324453/add-missing-dates-to-pandas-dataframe) — Sudharsana Rajasekaran, Oct 07 '19 at 21:08
Like WeNYoBen said, it does not make sense that your data goes from `2017-09` to `2018-10`. If we want to fill missing gaps, it will not ignore years. — Erfan, Oct 07 '19 at 21:26
@xQbert, that one doesn't take into account different date ranges for different groups — Juan C, Oct 08 '19 at 13:53

Efficient way to fill missing dates by group in pandas?

0 Answers0