How to group sessions of 30 minutes with reference initial timestamp on Pandas?

Question

So basically I have a bunch of users that enter in my website and I need them grouped by their sessions. A session is a 30 minutes connection with the same ID since the first login. If it takes more then 30 minutes it's refereed as a new session.

Sample input:

id,timestamp_datetime
1,2020-04-25 21:28:57.499 # Session 1 - first session
1,2020-04-25 21:41:41.691 
1,2020-04-25 21:41:11.055
1,2020-04-25 22:00:00.015  # Session 1 - second session (more then 30 minutes)
2,2020-04-25 21:41:41.691  # Session 2 - first session
2,2020-04-25 22:00:00.015 
2,2020-04-25 22:30:03.838  # Session 2 - second session
3,2020-04-25 21:41:41.691

Sample output:

id, count_sessions
1, 2
2, 2
3, 1

I have tried this

df.groupby([df.index.to_period('30T'),"id"]).count()

But it gave me the wrong results. Please help me fix it.

Did you solve it? If you provide a sample as https://stackoverflow.com/q/20109391/6692898 I can give it another go — RichieV, Jul 31 '20 at 19:30

score 0 · Answer 1 · answered Jul 26 '20 at 16:25

0

Something like:

np.ceil(df.groupby('id').diff().cumsum()/30)

answered Jul 26 '20 at 16:25

RichieV

5,103
2
11
24

How to group sessions of 30 minutes with reference initial timestamp on Pandas?

1 Answers1