-2

I have created a dataframe using below python code.

import pandas as pd
import datetime as dt

d = {'StartDate': pd.Series(["2018-11-01", "2018-11-04", "2018-11-06"]),
     'EndDate': pd.Series(["2018-11-03", "2018-11-05", "2018-11-10"])}
df = pd.DataFrame(d)
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])

I would like to have a column as Date, which will be having the dates between startdate and enddate columns values.

Expected Output:-

enter image description here

I did tried the same thing in R which i am familiar already.

R Script:- 

   df1 %>%
   rowwise() %>% 
   do(data.frame(.[1:2], date = seq(.$min_date, .$max_date, by = "1 day")))

Can anyone please suggest me?

yatu
  • 86,083
  • 12
  • 84
  • 139
Bunny
  • 409
  • 3
  • 21

1 Answers1

2

date_range + merge

You can create a date range using pd.date_range and merge the two dataframes setting as outer the merging type. Finally you can fill the missing values using the ffill method in fillna(), which will propagate the last valid observation forward to next valid backfill.

print(df)

  StartDate    EndDate
0 2018-11-01 2018-11-03
1 2018-11-04 2018-11-05
2 2018-11-06 2018-11-10

dates = pd.DataFrame(pd.date_range(start=df.min().StartDate, 
                     end=df.max().EndDate), columns=['Date'])
pd.merge(left=dates, right=df, left_on='Date', right_on='StartDate', 
         how='outer').fillna(method='ffill')

    Date  StartDate    EndDate
0 2018-11-01 2018-11-01 2018-11-03
1 2018-11-02 2018-11-01 2018-11-03
2 2018-11-03 2018-11-01 2018-11-03
3 2018-11-04 2018-11-04 2018-11-05
4 2018-11-05 2018-11-04 2018-11-05
5 2018-11-06 2018-11-06 2018-11-10
6 2018-11-07 2018-11-06 2018-11-10
7 2018-11-08 2018-11-06 2018-11-10
8 2018-11-09 2018-11-06 2018-11-10
9 2018-11-10 2018-11-06 2018-11-10
yatu
  • 86,083
  • 12
  • 84
  • 139