1

So basically I have a bunch of users that enter in my website and I need them grouped by their sessions. A session is a 30 minutes connection with the same ID since the first login. If it takes more then 30 minutes it's refereed as a new session.

Sample input:

id,timestamp_datetime
1,2020-04-25 21:28:57.499 # Session 1 - first session
1,2020-04-25 21:41:41.691 
1,2020-04-25 21:41:11.055
1,2020-04-25 22:00:00.015  # Session 1 - second session (more then 30 minutes)
2,2020-04-25 21:41:41.691  # Session 2 - first session
2,2020-04-25 22:00:00.015 
2,2020-04-25 22:30:03.838  # Session 2 - second session
3,2020-04-25 21:41:41.691

Sample output:

id, count_sessions
1, 2
2, 2
3, 1

I have tried this

df.groupby([df.index.to_period('30T'),"id"]).count()

But it gave me the wrong results. Please help me fix it.

Vishesh Mangla
  • 664
  • 9
  • 20
  • Did you solve it? If you provide a sample as https://stackoverflow.com/q/20109391/6692898 I can give it another go – RichieV Jul 31 '20 at 19:30

1 Answers1

0

Something like:

np.ceil(df.groupby('id').diff().cumsum()/30)
RichieV
  • 5,103
  • 2
  • 11
  • 24