16

How to calculate Session Duration in Firebase analytics raw data which is linked to BigQuery?

I have used the following blog to calculate the users by using the flatten command for the events which are nested within each record, but I would like to know how to proceed with in calculating the Session and Session duration by country and time.

(I have many apps configured, but if you could help me with the SQL query for calculating the session duration and session, It would be of immense help)

Google Blog on using Firebase and big query

AL.
  • 36,815
  • 10
  • 142
  • 281
Joshua Johnson
  • 194
  • 1
  • 1
  • 7
  • Firebase recently added some new session parameters. I think [this blog](https://stackoverflow.com/questions/42546815/how-to-calculate-session-and-session-duration-in-firebase-analytics-raw-data) is probably a more accurate now. – Kato Apr 11 '19 at 16:16

4 Answers4

18

First you need to define a session - in the following query I'm going to break a session whenever a user is inactive for more than 20 minutes.

Now, to find all sessions with SQL you can use a trick described at https://blog.modeanalytics.com/finding-user-sessions-sql/.

The following query finds all sessions and their lengths:

#standardSQL

SELECT app_instance_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, MAX(sess_id) OVER(PARTITION BY app_instance_id) total_sessions,
   (ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
FROM (
  SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
  FROM (
    SELECT *, IF(
                previous IS null 
                OR (min_time-previous)>(20*60*1000*1000),  # sessions broken by this inactivity 
                1, 0) session_start 
                #https://blog.modeanalytics.com/finding-user-sessions-sql/
    FROM (
      SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
      FROM (
        SELECT user_dim.app_info.app_instance_id
          , (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
          , (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time
        FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`
      )
    )
  )
)
GROUP BY 1, 2
ORDER BY 1, 2

enter image description here

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • 2
    user_engagement has a **event_dim.params.key** which contains **engagement_time_msec** and the **_event_dim.params.value.int_value_** contains the **engagement time in ms per instance** and do you think this parameter could be used for calculating the session duration? If it is true, it should be more easier than your calculation as SUM up the engagement time to calculate the session duration seems to be straight forward. – Joshua Johnson Mar 03 '17 at 08:23
  • maybe - but using that value won't allow you to define sessions as flexibly as this query would (which works on other datasets too) - I'll be happy to see if someone else contributes an alternative answer too! – Felipe Hoffa Mar 03 '17 at 18:27
  • You save my day – Camille Tolsa Apr 26 '18 at 13:57
  • 1
    Nice script @felipe! There is one thing I want to challenge in this script (if you will permit me). How do you account for user sessions that stretch across 2 day tables? More specifically, users which had sessions that stretched over 20160531 to 20160601, as well as 20160601 to 20160602. I guess one can run the query over multiple tables i.e. WHERE (_TABLE_SUFFIX BETWEEN '20160601' AND '20160630'), but it will still need to account for the beginning and the end of this range. Meaning sessions 20160531 to 20160601 and 20160630 to 20160701). Thoughts? – d_- Jul 25 '18 at 13:33
11

With the new schema of Firebase in BigQuery, I found that the answer by @Maziar did not work for me, but I am not sure why. Instead I have used the following to calculate it, where a session is defined as a user engaging with your app for a minimum of 10 seconds and where the session stops if a user doesn't engage with the app for 30 minutes. It provides total number of sessions and the session length in minutes, and it is based on this query: https://modeanalytics.com/modeanalytics/reports/5e7d902f82de/queries/2cf4af47dba4

SELECT COUNT(*) AS sessions,
       AVG(length) AS average_session_length
  FROM (
  
SELECT global_session_id,
       (MAX(event_timestamp) - MIN(event_timestamp))/(60 * 1000 * 1000) AS length
  FROM (
SELECT user_pseudo_id,
       event_timestamp,
       SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,
       SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS user_session_id
  FROM (
       SELECT *,
              CASE WHEN event_timestamp - last_event >= (30*60*1000*1000) 
                     OR last_event IS NULL 
                   THEN 1 ELSE 0 END AS is_new_session
         FROM (
              SELECT user_pseudo_id,
                     event_timestamp,
                     LAG(event_timestamp,1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS last_event
                FROM `dataset.events_2019*`
              ) last
       ) final
       ) session
 GROUP BY 1
       
       ) agg
WHERE length >= (10/60)
3

As you know, Google has changed the schema of BigQuery firebase databases: https://support.google.com/analytics/answer/7029846

Thanks to @Felipe answer, the new format will be changed as follow:

SELECT SUM(total_sessions) AS Total_Sessions, AVG(sess_length_seconds) AS Average_Session_Duration
FROM (
  SELECT user_pseudo_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, 
    MAX(sess_id) OVER(PARTITION BY user_pseudo_id) total_sessions,
    (ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
  FROM (
    SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY min_time) sess_id
    FROM (
      SELECT *, IF(previous IS null OR (min_time-previous) > (20*60*1000*1000), 1, 0) session_start 
      FROM (
        SELECT *, LAG(max_time, 1) OVER(PARTITION BY user_pseudo_id ORDER BY max_time) previous
        FROM (SELECT user_pseudo_id, MIN(event_timestamp) AS min_time, MAX(event_timestamp) AS max_time
          FROM `dataset_name.table_name` GROUP BY user_pseudo_id)
      )
    )
  )
  GROUP BY 1, 2
  ORDER BY 1, 2
)

Note: change dataset_name and table_name based on your project info

Sample result: enter image description here

2

With the recent change in which we have ga_session_id with each event row in the BigQuery table you can calculate number of sessions and average session length much more easily.

The value of the ga_session_id would remain same for the whole session, so you don't need to define the session separately.

You take the Min and the Max value of the event_timestamp column by grouping the result by user_pseudo_id , ga_session_id and event_date so that you get the session duration of the particular session of any user on any given date.

WITH 
UserSessions as (
SELECT
        user_pseudo_id,
        event_timestamp,
        event_date,
        (Select value.int_value from UNNEST(event_params) where key = "ga_session_id") as session_id,
        event_name
      FROM `projectname.dataset_name.events_*`
),
SessionDuration as (
SELECT
user_pseudo_id,
session_id,
COUNT(*) AS events,
TIMESTAMP_DIFF(MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND) AS session_duration
,event_date
FROM
UserSessions
WHERE session_id is not null
GROUP BY
user_pseudo_id,
session_id
,event_date
)
Select count(session_id) as NumofSessions,avg(session_duration) as AverageSessionLength from SessionDuration 

At last you just do the count of the session_id to get the total number of sessions and do the average of the session duration to get the value of the average session length.

Nikunj Kakadiya
  • 2,689
  • 2
  • 20
  • 35