In BigQuery, I'd like to replicate the Google Analytics 4 statistic 'engagement rate', which is defined as (more info):
sessions with engagement / total sessions
It is required to calculate for all platforms (iOS / Android / Web). In BigQuery, I'm using the default Google Analytics 4 data import tables.
I hereby see various engagement-related parameters, and even the same parameter with different value types, which confuses me a bit:
- parameter 'session_engaged'; for all session_start events, this parameter is included as type integer. The parameter only is added when session_engaged = 1; data for all platforms (ios, android, web).
- parameter 'session_engaged'; for all events except session_start events, this parameter is included as type string. The parameter is included in 100% of all existing (web) events with either the value '0' or '1'; data is ONLY available for platform = 'web'.
- parameter 'engaged_session_event'; this parameter is included (only) as type integer; the parameter is only included in an event when value = 1. data for all platforms (ios, android, web)
- And then there is also the parameter engagement_time_msec; I didn't use this parameter in the scope of this post, since I still doubt the validity of this parameter (see also an earlier post in which I questioned the parameter values)
For the replication of engagement rate, it seems option 2 of the above can't be used, since data is only for web, and I need the calculation to do for also ios and android.
Following 'option 1' or 'option 3' from the above: The output in BigQuery between these options is more or less equal, there is a very little difference. Comparing the output of both options to the output in the GA4 UI, the numbers don't match; in the GA4 UI, they are consistently 3-4% higher for each platform.
Query following 'option 1':
SELECT platform,
SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = 1 THEN CONCAT(user_pseudo_id,(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END),COUNT(DISTINCT CONCAT(user_pseudo_id,(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')))) AS engagement_rate
FROM `[project id].[dataset id].events_*`
WHERE _table_suffix between '20221008' AND '20221008'
GROUP BY 1
Query following 'option 3':
SELECT platform,
SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engaged_session_event') = 1 THEN CONCAT(user_pseudo_id,(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END),COUNT(DISTINCT CONCAT(user_pseudo_id,(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')))) AS engagement_rate
FROM `[project id].[dataset id].events_*`
WHERE _table_suffix between '20221008' AND '20221008'
GROUP BY 1
Clear documentation from Google side seem to be missing regarding the parameters session_engaged and engaged_session_event.
I'm looking for more clarity around the following questions:
- What does each parameter really means, what is the context around the values of each parameter and what are the differences between them?
- In which case which parameter should be used.
- How to calculate the 'engagement rate' in BigQuery, and replicate the numbers as displayed in the GA4 UI.
Does someone knows more about this?
Thanks in advance!