if you want to add a column with the list of dates tat are missing in between, this shoudl work. This could be more efficient and it has to work around the NaT
in the last row and becomes a bit longer as intended, but gives you the result.
import pandas as pd
from datetime import timedelta
test_df = pd.DataFrame({
"Dates" :
["11-Feb-18", "18-Feb-18", "03-Mar-18", "25-Mar-18", "29-Mar-18", "04-Apr-18",
"08-Apr-18", "14-Apr-18", "17-Apr-18", "30-Apr-18", "04-May-18"]
})
res = (
test_df
.assign(
# convert to datetime
Dates = lambda x : pd.to_datetime(x.Dates),
# get next rows date
Dates_next = lambda x : x.Dates.shift(-1),
# create the date range
Dates_list = lambda x : x.apply(
lambda x :
pd.date_range(
x.Dates + timedelta(days=1),
x.Dates_next - timedelta(days=1),
freq="D").date.tolist()
if pd.notnull(x.Dates_next)
else None
, axis = 1
))
)
print(res)
results in:
Dates Dates_next Dates_list
0 2018-02-11 2018-02-18 [2018-02-12, 2018-02-13, 2018-02-14, 2018-02-1...
1 2018-02-18 2018-03-03 [2018-02-19, 2018-02-20, 2018-02-21, 2018-02-2...
2 2018-03-03 2018-03-25 [2018-03-04, 2018-03-05, 2018-03-06, 2018-03-0...
3 2018-03-25 2018-03-29 [2018-03-26, 2018-03-27, 2018-03-28]
4 2018-03-29 2018-04-04 [2018-03-30, 2018-03-31, 2018-04-01, 2018-04-0...
5 2018-04-04 2018-04-08 [2018-04-05, 2018-04-06, 2018-04-07]
6 2018-04-08 2018-04-14 [2018-04-09, 2018-04-10, 2018-04-11, 2018-04-1...
7 2018-04-14 2018-04-17 [2018-04-15, 2018-04-16]
8 2018-04-17 2018-04-30 [2018-04-18, 2018-04-19, 2018-04-20, 2018-04-2...
9 2018-04-30 2018-05-04 [2018-05-01, 2018-05-02, 2018-05-03]
10 2018-05-04 NaT None
As a sidenote, if you don't need the last row after the analysis, you could filter out the last row after assigning the next date and eliminate the if
statement to make it faster.