0

In BigQuery, I'd like to replicate the Google Analytics 4 statistic 'engagement rate', which is defined as (more info):

sessions with engagement / total sessions

It is required to calculate for all platforms (iOS / Android / Web). In BigQuery, I'm using the default Google Analytics 4 data import tables.

I hereby see various engagement-related parameters, and even the same parameter with different value types, which confuses me a bit:

  1. parameter 'session_engaged'; for all session_start events, this parameter is included as type integer. The parameter only is added when session_engaged = 1; data for all platforms (ios, android, web).
  2. parameter 'session_engaged'; for all events except session_start events, this parameter is included as type string. The parameter is included in 100% of all existing (web) events with either the value '0' or '1'; data is ONLY available for platform = 'web'.
  3. parameter 'engaged_session_event'; this parameter is included (only) as type integer; the parameter is only included in an event when value = 1. data for all platforms (ios, android, web)
  4. And then there is also the parameter engagement_time_msec; I didn't use this parameter in the scope of this post, since I still doubt the validity of this parameter (see also an earlier post in which I questioned the parameter values)

For the replication of engagement rate, it seems option 2 of the above can't be used, since data is only for web, and I need the calculation to do for also ios and android.

Following 'option 1' or 'option 3' from the above: The output in BigQuery between these options is more or less equal, there is a very little difference. Comparing the output of both options to the output in the GA4 UI, the numbers don't match; in the GA4 UI, they are consistently 3-4% higher for each platform.

Query following 'option 1':

SELECT platform,
SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = 1 THEN CONCAT(user_pseudo_id,(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END),COUNT(DISTINCT CONCAT(user_pseudo_id,(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')))) AS engagement_rate
FROM `[project id].[dataset id].events_*`
WHERE _table_suffix between '20221008' AND '20221008'
GROUP BY 1

Query following 'option 3':

SELECT platform,
SAFE_DIVIDE(COUNT(DISTINCT CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engaged_session_event') = 1 THEN CONCAT(user_pseudo_id,(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) END),COUNT(DISTINCT CONCAT(user_pseudo_id,(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')))) AS engagement_rate
FROM `[project id].[dataset id].events_*`
WHERE _table_suffix between '20221008' AND '20221008'
GROUP BY 1

Clear documentation from Google side seem to be missing regarding the parameters session_engaged and engaged_session_event.

I'm looking for more clarity around the following questions:

  1. What does each parameter really means, what is the context around the values of each parameter and what are the differences between them?
  2. In which case which parameter should be used.
  3. How to calculate the 'engagement rate' in BigQuery, and replicate the numbers as displayed in the GA4 UI.

Does someone knows more about this?

Thanks in advance!

Timo Rietveld
  • 451
  • 2
  • 6
  • 20

1 Answers1

0

Option 1 it is!

safe_divide(count(distinct case when (select value.string_value from unnest(event_params) where key = 'session_engaged') = '1' then concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')) end),count(distinct concat(user_pseudo_id,(select value.int_value from unnest(event_params) where key = 'ga_session_id')))) as engagement_rate