0

I have event data logged in BigQuery, where I want to group the requests in the following way:

  1. If a request happens more than x minutes apart from the last request, start a new session
  2. If the session grows longer than y minutes, close the session and start a new one

The only two columns essential for defining a session (above requirements) are a user_id column and a timestamp of when an event took place.

The goal is to construct sessions out of individual user event streams.

Victor M Perez
  • 2,185
  • 3
  • 19
  • 22
axrd
  • 97
  • 1
  • 9
  • #1 is already solved here https://stackoverflow.com/questions/42546815/how-to-calculate-session-and-session-duration-in-firebase-analytics-raw-data – axrd Feb 13 '18 at 02:27
  • #2 needs to be defined as part of the SQL for #1 or brand new SQL to break up the long sessions into sessions of y mins. – axrd Feb 13 '18 at 02:28

1 Answers1

1

You cannot do this in BigQuery. The first condition is easy -- you just need to know the time difference between adjacent rows.

Unfortunately, the second requires "memory". Basically, two individual rows do not give you the information you need. SQL, in general, does have a solution, recursive CTEs. However, these are not supported in BigQuery.

If you wanted to break the sessions at a particular time, say midnight, then you could proceed.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • In the doc referring to the [WITH clause](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#with-clause), you can see that they in nature the same as CTE's. It is explained at the end of the text that _BigQuery does not support WITH RECURSIVE_, i.e., recursive CTE's are not supported. You can point the user to this docs as a way of improving the answer. – Victor M Perez Mar 02 '18 at 14:38