3

I am a beginner in python. I have a huge dataframe. The data looks like this:

df
ID  Annotation  Time
A   Boarding    7:20:00
A   Alighting   8:30:50
B   Boarding    13:45:00
B   Alighting   14:00:05
C   Boarding    17:05:00
C   Alighting   17:15:00

I want to calculate travel time between boarding and alighting for each ID. My expected result looks like this: Result

ID  Time Boarding   Time Alighting  Travel Time (Minutes)
A   7:20:00         8:30:50         70.83
B   13:45:00        14:00:05        15.08
C   17:05:00        17:15:00        10.00

I need advice. thank you in advance.

Arief Hidayat
  • 937
  • 1
  • 8
  • 19
  • 2
    This looks like a good place to use `pivot`. I've not tried this myself but I think the below links can help get you going: https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe https://stackoverflow.com/questions/37840812/pandas-subtracting-two-date-columns-and-the-result-being-an-integer/37848869 – Jake Losh Jul 04 '19 at 04:03

2 Answers2

1

It is indeed a case of pivot:

# change to datetime
df['Time'] = pd.to_datetime(df['Time'])

new_df = df.pivot(index='ID', columns='Annotation', values='Time')
s = (new_df['Alighting'] - new_df['Boarding'])
new_df['Travel Time'] = s.dt.seconds / 60

Output:

Annotation           Alighting            Boarding  Travel Time
ID                                                             
A          2019-07-04 08:30:50 2019-07-04 07:20:00    70.833333
B          2019-07-04 14:00:05 2019-07-04 13:45:00    15.083333
C          2019-07-04 17:15:00 2019-07-04 17:05:00    10.000000
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

A solution without pivot:

>>> df2 = pd.DataFrame({'Time %s' % i: pd.to_datetime(pd.Series(x.values.ravel()))
                      for i, x in df.iloc[:, 1:].set_index('Annotation').T.groupby(level=0, axis=1)})
>>> df2['ID'] = df['ID'].unique()
>>> df2['Travel Time (Minutes)'] = (df2['Time Alighting'] - df2['Time Boarding']).dt.seconds / 60
>>> df2 = df2[['ID', 'Time Boarding', 'Time Alighting', 'Travel Time (Minutes)']]
>>> df2
  ID       Time Boarding      Time Alighting  Travel Time (Minutes)
0  A 2019-07-04 07:20:00 2019-07-04 08:30:50              70.833333
1  B 2019-07-04 13:45:00 2019-07-04 14:00:05              15.083333
2  C 2019-07-04 17:05:00 2019-07-04 17:15:00              10.000000
>>> 
U13-Forward
  • 69,221
  • 14
  • 89
  • 114