18

I have a dataframe that looks like this:

from    to         datetime              other
-------------------------------------------------
11      1     2016-11-06 22:00:00          -
11      1     2016-11-06 20:00:00          -
11      1     2016-11-06 15:45:00          -
11      12    2016-11-06 15:00:00          -
11      1     2016-11-06 12:00:00          -
11      18    2016-11-05 10:00:00          -
11      12    2016-11-05 10:00:00          -
12      1     2016-10-05 10:00:59          -
12      3     2016-09-06 10:00:34          -

I want to groupby "from" and then "to" columns and then sort the "datetime" in descending order and then finally want to calculate the time difference within these grouped by objects between the current time and the next time. For eg, in this case, I would like to have a dataframe like the following:

from    to     timediff in minutes                                          others
11      1            120
11      1            255
11      1            225
11      1            0 (preferrably subtract this date from the epoch)
11      12           300
11      12           0
11      18           0
12      1            25
12      3            0

I can't get my head around figuring this out!! Is there a way out for this? Any help will be much much appreciated!! Thank you so much in advance!

piRSquared
  • 285,575
  • 57
  • 475
  • 624
Gingerbread
  • 1,938
  • 8
  • 22
  • 36
  • [Does this post help](http://stackoverflow.com/q/2788871/6912791)? It's just a simple way to compare DateTime objects. I am not extremely familiar with dataframes, but If I remember correctly you can loop over certain columns. – Pike D. Jan 30 '17 at 06:13

3 Answers3

19
df.assign(
    timediff=df.sort_values(
        'datetime', ascending=False
    ).groupby(['from', 'to']).datetime.diff(-1).dt.seconds.div(60).fillna(0))

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
17

I think you need:

groupby with apply sort_values with diff, convert Timedelta to minutes by seconds and floor division 60

fillna and sort_index, remove level 2 in index

df = df.groupby(['from','to']).datetime
       .apply(lambda x: x.sort_values().diff().dt.seconds // 60)
       .fillna(0)
       .sort_index()
       .reset_index(level=2, drop=True)
       .reset_index(name='timediff in minutes')

print (df)

   from  to  timediff in minutes 
0    11   1                 120.0
1    11   1                 255.0
2    11   1                 225.0
3    11   1                   0.0
4    11  12                 300.0
5    11  12                   0.0
6    11  18                   0.0
7    12   3                   0.0
8    12   3                   0.0

df = df.join(df.groupby(['from','to'])
               .datetime
               .apply(lambda x: x.sort_values().diff().dt.seconds // 60)
               .fillna(0)
               .reset_index(level=[0,1], drop=True)
               .rename('timediff in minutes'))
print (df)
   from  to            datetime other  timediff in minutes
0    11   1 2016-11-06 22:00:00     -                120.0
1    11   1 2016-11-06 20:00:00     -                255.0
2    11   1 2016-11-06 15:45:00     -                225.0
3    11  12 2016-11-06 15:00:00     -                300.0
4    11   1 2016-11-06 12:00:00     -                  0.0
5    11  18 2016-11-05 10:00:00     -                  0.0
6    11  12 2016-11-05 10:00:00     -                  0.0
7    12   3 2016-10-05 10:00:59     -                  0.0
8    12   3 2016-09-06 10:00:34     -                  0.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
14

Almost as above, but without apply:

result = df.sort_values(['from','to','datetime'])\
           .groupby(['from','to'])['datetime']\
           .diff().dt.seconds.fillna(0)
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • 1
    I've found using .apply() as in jezrael's answer just slightly faster than calling .diff() directly (e.g. %timeit on 1000 rows, 100 loops, best of 3: 10.9ms/loop with apply and 11.1ms/loop without apply) – Quetzalcoatl Oct 09 '17 at 20:32
  • ``` .astype('timedelta64[m]').fillna(0) ``` instead of the ``` .dt.seconds.fillna(0) ``` is also comfortable – Talis Sep 16 '20 at 12:22