2

Assume the below dataframe, df

   Start_Date   End_Date
0   20201101    20201130
1   20201201    20201231
2   20210101    20210131
3   20210201    20210228
4   20210301    20210331

How to Calculate time difference between two date columns in days?

Required Output

   Start_Date   End_Date   Diff_in_Days
0   20201101    20201130
1   20201201    20201231
2   20210101    20210131
3   20210201    20210228
4   20210301    20210331
  • Does this answer your question? [Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes](https://stackoverflow.com/questions/22923775/calculate-pandas-dataframe-time-difference-between-two-columns-in-hours-and-minu) – Trenton McKinney Aug 23 '20 at 01:09

1 Answers1

3

First idea is convert columns to datetimes, get difference and convert timedeltas to days by Series.dt.days:

df['Diff_in_Days'] = (pd.to_datetime(df['End_Date'], format='%Y%m%d')
                        .sub(pd.to_datetime(df['Start_Date'], format='%Y%m%d'))
                        .dt.days)
print (df)
   Start_Date  End_Date  Diff_in_Days
0    20201101  20201130            29
1    20201201  20201231            30
2    20210101  20210131            30
3    20210201  20210228            27
4    20210301  20210331            30

Another solution better if processing datetimes later is reassign back columns and use solution above:

df['Start_Date'] = pd.to_datetime(df['Start_Date'], format='%Y%m%d')
df['End_Date'] = pd.to_datetime(df['End_Date'], format='%Y%m%d')

df['Diff_in_Days'] = df['End_Date'].sub(df['Start_Date']).dt.days
print (df)
  Start_Date   End_Date  Diff_in_Days
0 2020-11-01 2020-11-30            29
1 2020-12-01 2020-12-31            30
2 2021-01-01 2021-01-31            30
3 2021-02-01 2021-02-28            27
4 2021-03-01 2021-03-31            30
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252