Having an extremely large events table in BigQuery:
uid |event_time |
------+-----------------------+
1 |2020-03-01 14:43:01.380|
------+-----------------------+
1 |2020-03-01 14:53:01.320|
------+-----------------------+
2 |2020-03-01 15:03:01.180|
...
I'm trying to count the number of sessions based on the time passed since the last event, if more then 10 minutes passed I consider it to be a new event.
I've written the following query:
with sessions as(
select uid ,event_time,
lag(event_time ,1) over (PARTITION BY uid ORDER BY event_time) previous_event_time
FROM `af-raw-data.prodv1.inapps_android_regular`
WHERE _PARTITIONDATE BETWEEN '2020-03-01' AND '2020-03-02'
limit 1000
)
select *, TIMESTAMP_DIFF(TIMESTAMP(event_time),TIMESTAMP(previous_event_time),MINUTE) minuets_passed,
case(previous_event_time is null
or
TIMESTAMP_DIFF(TIMESTAMP(event_time),TIMESTAMP(previous_event_time),MINUTE) > 5)
when true then 1 else 0 end is_new_session from sessions
It works fine and produce the expected output, however it uses a limit of 1000 records, when I try to run it on a day of data BigQuery crashes.
I'm looking for other way to count the number of 10 minuets sessions per user.