Fill sequential date between start & end date from two different column of pandas data frame

Question

I'm using jupyterlab version 3.1.9. I have a pandas dataframe df. df contains start & end date. I would like to create a new data frame df1 from df so that it will have all the date between start & end date & all other columns remain same. My Sample df data looks like

ProductId     StartDate     EndDate
    1         2020-05-21    2020-05-22
    2         2020-04-16    2020-04-18
    3         2020-07-25    2020-07-26
    4         2020-09-16    2020-09-20

My new data frame df1 will look like

ProductId        Date     
    1         2020-05-21    
    1         2020-05-22    
    2         2020-04-16
    2         2020-04-17
    2         2020-04-18    
    3         2020-07-25    
    3         2020-07-26 
    4         2020-09-16    
    4         2020-09-17    
    4         2020-09-18    
    4         2020-09-19    
    4         2020-09-20

Can you suggest me how to do this in python?

score 4 · Accepted Answer · answered Sep 15 '21 at 14:26

Create the list of date then explode it

df['new'] = [pd.date_range(x, y ) for x, y in zip(df.StartDate, df.EndDate)]
out = df.explode('new')
Out[37]: 
   ProductId  StartDate    EndDate        new
0          1 2020-05-21 2020-05-22 2020-05-21
0          1 2020-05-21 2020-05-22 2020-05-22
1          2 2020-04-16 2020-04-18 2020-04-16
1          2 2020-04-16 2020-04-18 2020-04-17
1          2 2020-04-16 2020-04-18 2020-04-18
2          3 2020-07-25 2020-07-26 2020-07-25
2          3 2020-07-25 2020-07-26 2020-07-26
3          4 2020-09-16 2020-09-20 2020-09-16
3          4 2020-09-16 2020-09-20 2020-09-17
3          4 2020-09-16 2020-09-20 2020-09-18
3          4 2020-09-16 2020-09-20 2020-09-19
3          4 2020-09-16 2020-09-20 2020-09-20

score 0 · Answer 2 · answered Sep 15 '21 at 15:22

Here is an answer inspired by that of @BENY.

I gave myself the challenge of using only itertools, zip/map and the dataframe constructor.

Unplanned consequence, it is quite faster.

from itertools import starmap, product, chain
dr = starmap(pd.date_range, zip(df['StartDate'], df['EndDate']))
pd.DataFrame(chain(*map(lambda i: list(product([i[0]], i[1])), zip(df['ProductId'], dr))),
             columns=['ProductId', 'Date'])

Fill sequential date between start & end date from two different column of pandas data frame

2 Answers2

Linked

Related