0

I have a dataframe has two columns: unix_time and user. It has thousands of rows, this is part of it:

unix_time       user
2000000000000   A
2000000000001   A
2000000000002   B
2000000000003   B
2000000000004   B

I want to calculate how much unix_time each user spent in total by:
1. calculating time difference between rows. eg: unix_time column (row2 - row1)
2. sum the time difference if they are from the same user. eg: sum(row2 - row1) and (row3 - row2)

output will be

time_difference_sum  user
1                    A
2                    B

I read several posts such as these two but still struggle to find a solution because I got more constraints. Any suggestions about how can I do this ? Thank you in advanced!

rahlf23
  • 8,869
  • 4
  • 24
  • 54
Osca
  • 1,588
  • 2
  • 20
  • 41

1 Answers1

1

You can use groupby() and diff() and then agg() your results:

df['time_difference_sum'] = df.sort_values(['user','unix_time']).groupby('user')['unix_time'].diff()

df.groupby('user').agg({'time_difference_sum': 'sum'})

Yields:

      time_difference_sum
user                     
A                     1.0
B                     2.0
rahlf23
  • 8,869
  • 4
  • 24
  • 54
  • Thank you but I got error `Wrong number of items passed 2, placement implies 1` when I tried on real data. I wonder is that because in real data user B may have smaller unix_time than user A although the row number is later than user A? – Osca Nov 02 '18 at 23:54
  • Do you have more columns in your original dataframe than you show in your question? See my edit, which may resolve that issue. – rahlf23 Nov 03 '18 at 00:07
  • sure thank you! `unix_time user 1417412578004 i2wl1o1n5wsoahi066uf 1417412581732 i2wl1o1n5wsoahi066uf 1417412581733 i2wl1o1n5wsoahi066uf 1417412581733 i2wl1o1n5wsoahi066uf 1417412590099 i2wl1o1n5wsoahi066uf 1417412591780 i2wl1o1n5wsoahi066uf 1417412596325 i2wl1o1n5wsoahi066uf 1417412596326 i2wl1o1n5wsoahi066uf 1417412600171 i2wl1o1n5wsoahi066uf 1417412600171 i2wl1o1n5wsoahi066uf 1417412600172 i2wl1o1n5wsoahi066uf 1417412604687 i2wl1o1n5wsoahi066uf 1417412647448 i2wl1o1n5wsoahi066uf 1417412649678 i2wl1o1n5wsoahi066uf` – Osca Nov 03 '18 at 00:15