0

Given a sample dataframe:

ID Start Date End Date
1 01-01-2015 09-08-2022
2 01-01-2016 05-10-2020

How can I create a new column, New Date, that captures the date range between the Start Date and End Date?

ID Start Date End Date New Date
1 01-01-2015 09-08-2022 01-01-2015
1 01-01-2015 09-08-2022 01-02-2015
1 01-01-2015 09-08-2022 01-03-2015
1 01-01-2015 09-08-2022 ...
1 01-01-2015 09-08-2022 09-08-2022
2 01-01-2016 05-10-2020 01-01-2016
2 01-01-2016 05-10-2020 01-02-2016
2 01-01-2016 05-10-2020 01-03-2016
2 01-01-2016 05-10-2020 ...
2 01-01-2016 05-10-2020 05-10-2020

I have tried using reindex but my date columns are not in the index currently, is there a way that would allow me to keep my index intact?

Daniel
  • 691
  • 2
  • 8
  • 19

1 Answers1

3

Given:

   ID  Start_Date    End_Date
0   1  01-01-2015  09-08-2022
1   2  01-01-2016  05-10-2020

Doing:

# Ensure your columns are proper datetimes:
for col in ('Start_Date', 'End_Date'):
    df[col] = pd.to_datetime(df[col])

# For each row, create a daily DatetimeIndex from Start_Date to End_Date
df['New_Date'] = df.apply(lambda x: pd.date_range(x.Start_Date, x.End_Date, freq='1d'), axis=1)

# Explode this column of DatetimeIndices:
df = df.explode('New_Date')
print(df)

Output:

    ID Start_Date   End_Date   New_Date
0    1 2015-01-01 2022-09-08 2015-01-01
0    1 2015-01-01 2022-09-08 2015-01-02
0    1 2015-01-01 2022-09-08 2015-01-03
0    1 2015-01-01 2022-09-08 2015-01-04
0    1 2015-01-01 2022-09-08 2015-01-05
..  ..        ...        ...        ...
1    2 2016-01-01 2020-05-10 2020-05-06
1    2 2016-01-01 2020-05-10 2020-05-07
1    2 2016-01-01 2020-05-10 2020-05-08
1    2 2016-01-01 2020-05-10 2020-05-09
1    2 2016-01-01 2020-05-10 2020-05-10

[4400 rows x 4 columns]
BeRT2me
  • 12,699
  • 2
  • 13
  • 31