Sum time difference and pivot it - pandas dataframe

Question

I have a dataframe has two columns: unix_time and user. It has thousands of rows, this is part of it:

unix_time       user
2000000000000   A
2000000000001   A
2000000000002   B
2000000000003   B
2000000000004   B

I want to calculate how much unix_time each user spent in total by:
1. calculating time difference between rows. eg: unix_time column (row2 - row1)
2. sum the time difference if they are from the same user. eg: sum(row2 - row1) and (row3 - row2)

output will be

time_difference_sum  user
1                    A
2                    B

I read several posts such as these two but still struggle to find a solution because I got more constraints. Any suggestions about how can I do this ? Thank you in advanced!

rahlf23 · Accepted Answer · 2018-11-03T00:07:11.330

1

You can use groupby() and diff() and then agg() your results:

df['time_difference_sum'] = df.sort_values(['user','unix_time']).groupby('user')['unix_time'].diff()

df.groupby('user').agg({'time_difference_sum': 'sum'})

Yields:

      time_difference_sum
user                     
A                     1.0
B                     2.0

edited Nov 03 '18 at 00:07

answered Nov 02 '18 at 23:22

rahlf23

8,869
4
24
54

Thank you but I got error `Wrong number of items passed 2, placement implies 1` when I tried on real data. I wonder is that because in real data user B may have smaller unix_time than user A although the row number is later than user A? – Osca Nov 02 '18 at 23:54
Do you have more columns in your original dataframe than you show in your question? See my edit, which may resolve that issue. – rahlf23 Nov 03 '18 at 00:07
sure thank you! `unix_time user 1417412578004 i2wl1o1n5wsoahi066uf 1417412581732 i2wl1o1n5wsoahi066uf 1417412581733 i2wl1o1n5wsoahi066uf 1417412581733 i2wl1o1n5wsoahi066uf 1417412590099 i2wl1o1n5wsoahi066uf 1417412591780 i2wl1o1n5wsoahi066uf 1417412596325 i2wl1o1n5wsoahi066uf 1417412596326 i2wl1o1n5wsoahi066uf 1417412600171 i2wl1o1n5wsoahi066uf 1417412600171 i2wl1o1n5wsoahi066uf 1417412600172 i2wl1o1n5wsoahi066uf 1417412604687 i2wl1o1n5wsoahi066uf 1417412647448 i2wl1o1n5wsoahi066uf 1417412649678 i2wl1o1n5wsoahi066uf` – Osca Nov 03 '18 at 00:15

Sum time difference and pivot it - pandas dataframe

1 Answers1