0

I have a DataFrame as such:

ID                Departuredate Arrivaldate
000afb96ded6677c  2014-02-08    2014-09-08        
                  2015-10-22    2015-10-26        
                  2017-02-19    2017-02-26              
                  ..

ffea14e87a4e1269  2020-02-18    2020-02-23        
                  2020-07-18    2020-07-23        

I need to find the date difference from each trip with the previous trip from that specific id and put into column next to it like this:

    ID            Departuredate Arrivaldate  Time_Difference_Between_Trips
000afb96ded6677c  2014-02-08    2014-09-08   0 (days) # 0 days because there is no previous trip    
                  2015-10-22    2015-10-26   X (days) # 2015-10-22 - 2014-09-08   
                  2017-02-19    2017-02-26   X (days) # 2017-02-19 - 2015-10-26              
                  ..

ffea14e87a4e1269  2020-02-18    2020-02-23  X (days) # 0 days because there is no previous trip    
                  2020-07-18    2020-07-23  X (days) # 2020-07-18 - 2020-02-23              
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Futurex
  • 23
  • 4
  • 1
    Can you post the actual dataframe instead of posting the pics? check - https://stackoverflow.com/help/minimal-reproducible-example – Nk03 Jun 05 '21 at 14:39
  • If you need assistance formatting a small sample of your DataFrame as a copyable piece of code for SO see [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker Jun 05 '21 at 14:46
  • @Nk03 i posted the actual dataframe sorry – Futurex Jun 05 '21 at 15:12

1 Answers1

2

Try groupby shift + fillna:

df['Time_Difference_Between_Trips'] = (
    (df['Departuredate'] - df.groupby('ID')['Arrivaldate'].shift())
        .fillna(pd.Timedelta(days=0))
)
                 ID Departuredate Arrivaldate Time_Difference_Between_Trips
0  000afb96ded6677c    2014-02-08  2014-09-08                        0 days
1  000afb96ded6677c    2015-10-22  2015-10-26                      409 days
2  000afb96ded6677c    2017-02-19  2017-02-26                      482 days
3  ffea14e87a4e1269    2020-02-18  2020-02-23                        0 days
4  ffea14e87a4e1269    2020-07-18  2020-07-23                      146 days

(If ID is the index instead of a column groupby level=0 instead):

df['Time_Difference_Between_Trips'] = (
    (df['Departuredate'] - df.groupby(level=0)['Arrivaldate'].shift())
        .fillna(pd.Timedelta(days=0))
)

Complete working example:

import pandas as pd

df = pd.DataFrame({
    'ID': ['000afb96ded6677c', '000afb96ded6677c', '000afb96ded6677c',
           'ffea14e87a4e1269', 'ffea14e87a4e1269'],
    'Departuredate': ['2014-02-08', '2015-10-22', '2017-02-19', '2020-02-18',
                      '2020-07-18'],
    'Arrivaldate': ['2014-09-08', '2015-10-26', '2017-02-26', '2020-02-23',
                    '2020-07-23']
})
df['Departuredate'] = pd.to_datetime(df['Departuredate'])
df['Arrivaldate'] = pd.to_datetime(df['Arrivaldate'])

df['Time_Difference_Between_Trips'] = (
    (df['Departuredate'] - df.groupby('ID')['Arrivaldate'].shift())
        .fillna(pd.Timedelta(days=0))
)
print(df)
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57