0

df has three columns - date, name, and qty. For each name and date combination I want to insert n rows such that name is repeated in these next n rows but date is increased by 1 business day and qty=nan if that name and date combination doesn't exist already in df.

>>> import pandas as pd
>>> from datetime import datetime
>>> df = pd.DataFrame({'name':['abd']*3 + ['pqr']*2 + ['xyz']*1, 'date':[datetime(2020,1,6), datetime(2020,1,8), datetime(2020,2,5), datetime(2017,10,4), datetime(2017,10,13), datetime(2013,5,27)], 'qty':range(6)})
>>> df
  name       date  qty
0  abd 2020-01-06   10
1  abd 2020-01-08    1
2  abd 2020-02-05    2
3  pqr 2017-10-04    3
4  pqr 2017-10-13    4
5  xyz 2013-05-27    5

I am not sure how to go about it. Any thoughts/clues. Thanks a lot!

Desired output for n=3:

   name       date  qty
0   abd 2020-01-06   10
1   abd 2020-01-07  nan
2   abd 2020-01-08    1
3   abd 2020-01-09  nan
4   abd 2020-01-10  nan
5   abd 2020-01-13  nan
6   abd 2020-02-05    2
7   abd 2020-02-08  nan
8   abd 2020-02-09  nan
9   abd 2020-02-10  nan
10  pqr 2017-10-04    3
11  pqr 2017-10-05  nan
12  pqr 2017-10-06  nan
13  pqr 2017-10-09  nan
14  pqr 2017-10-13    4
15  pqr 2017-10-16  nan
16  pqr 2017-10-17  nan
17  pqr 2017-10-18  nan
18  xyz 2013-05-27    5
19  xyz 2013-05-28  nan
20  xyz 2013-05-29  nan
21  xyz 2013-05-30  nan
Gerry
  • 606
  • 6
  • 16

1 Answers1

1

Here is a way:

from functools import reduce

n = 3
new_index = (
    df.groupby("name")
    .apply(
        lambda x: reduce(
            lambda i, j: i.union(j),
            [pd.bdate_range(i, periods=n + 1) for i in x["date"]],
        )
    )
    .explode()
)
midx = pd.MultiIndex.from_frame(new_index.reset_index(), names=["name", "date"])
df_out = df.set_index(["name", "date"]).reindex(midx).reset_index()
df_out

If explode cannot be used:

from functools import reduce

n = 3
new_index = (
    df.groupby("name")
    .apply(
        lambda x: reduce(
            lambda i, j: i.union(j),
            [pd.bdate_range(i, periods=n + 1) for i in x["date"]],
        )
    )
    .apply(pd.Series)
    .stack()
    .reset_index(level=0)
    .rename(columns={0:'date'})
)
df_out = new_index.merge(df, how='left', on=['name', 'date'])
df_out

Output:

   name       date  qty
0   abd 2020-01-06  0.0
1   abd 2020-01-07  NaN
2   abd 2020-01-08  1.0
3   abd 2020-01-09  NaN
4   abd 2020-01-10  NaN
5   abd 2020-01-13  NaN
6   abd 2020-02-05  2.0
7   abd 2020-02-06  NaN
8   abd 2020-02-07  NaN
9   abd 2020-02-10  NaN
10  pqr 2017-10-04  3.0
11  pqr 2017-10-05  NaN
12  pqr 2017-10-06  NaN
13  pqr 2017-10-09  NaN
14  pqr 2017-10-13  4.0
15  pqr 2017-10-16  NaN
16  pqr 2017-10-17  NaN
17  pqr 2017-10-18  NaN
18  xyz 2013-05-27  5.0
19  xyz 2013-05-28  NaN
20  xyz 2013-05-29  NaN
21  xyz 2013-05-30  NaN

How it works:

First import reduce from functools to use pd.Index.union to create a single list of dates. The list of dates is created from pd.bdate_range, with in groupby for each name. Convert that list of new_index, and names to a MultiIndex using pd.MultiIndex.from_frame. Use reindex after set_index on the original dataframe.

Community
  • 1
  • 1
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • It gives me `AttributeError: 'Series' object has no attribute 'explode'`. This could be version error - is there a way to bypass it without upgrading version? – Gerry Jan 31 '21 at 23:45
  • Yep, this is more likely a version issue. Let's see if we can "explode" that list differently. – Scott Boston Jan 31 '21 at 23:51
  • We can try one of these [methods](https://stackoverflow.com/a/53218939/6361531). – Scott Boston Feb 01 '21 at 00:11