It is true that there was a schema change in the Google Analytics for Firebase BigQuery Export. Although there is no clear mapping of the old fields as compared to the new ones, the SQL query that is provided in the documentation in order to migrate existing BQ datasets from the old schema to the new one provides some hints of how have these fields changed.
I share the migration_script.sql
SQL query below, just for reference, but let me pin-point the most relevant changes for your use-case:
- event_dim is mapped as event in the SQL query, but does not have any final representation in the schema, because event_dim is no longer a nested field:
UNNEST(event_dim) AS event
- event_dim.timestamp_micros is mapped as event_timestamp:
event.timestamp_micros AS event_timestamp
- event_dim.name is mapped as event_name:
event.name AS event_name
- event_param.value.int_value is mapped as event_params.value.int_value:
event_param.value.int_value AS int_value
- user_dim.user_properties is mapped as user_properties, and all its nested values follow the same structure:
UNNEST(user_dim.user_properties) AS user_property) AS user_properties
So, in summary, the schema change has been focused at unnesting several of the fields for simplicity, in such a way that, for example, instead of having to access event_dim.name
(which would require unnesting and complicating the query), you can query directly the field event_name
.
Having this in mind, I am sure you will be able to adapt your query to this new schema, and it will probably look way more simple, given that you will not have to unnest so many fields.
Just for clarification, let me share with you a couple of sample BQ queries comparing the old and the new schema (they are using public Firebase tables, so you should be able to run them out-of-the-box):
# Old Schema - UNNEST() required because there are nested fields
SELECT
user_dim.app_info.app_instance_id,
MIN(event.timestamp_micros) AS min_time,
MAX(event.timestamp_micros) AS max_time,
event.name
FROM
`firebase-public-project.com_firebase_demo_ANDROID.app_events_20180503`,
UNNEST(event_dim) AS event
WHERE
event.name = "user_engagement"
GROUP BY
user_dim.app_info.app_instance_id,
event.name
As compared to:
# New Schema - UNNEST() not required because there are no nested fields
SELECT
user_pseudo_id,
MIN(event_timestamp) AS min_time,
MAX(event_timestamp) AS max_time,
event_name
FROM
`firebase-public-project.analytics_153293282.events_20180815`
WHERE
event_name = "user_engagement"
GROUP BY
user_pseudo_id,
event_name
These queries are equivalent, but referencing tables with the old and new schema. Please note that, as your query is more complex, you may need to add some UNNEST() in order to access the remaining nested fields in the table.
Additionally, you may want to have a look at these samples that can help you with some ideas on how to write queries with the new schema.
EDIT 2
My understanding is that a query like the one below should allow you to query for all the fields in a single statement. I am grouping by all the non-aggregated/filtered fields, but depending on your use case (this is definitely something you would need to work on your own) you may want to apply a different strategy in order to be able to query the non-grouped fields (i.e. use a MIN/MAX filter, etc.).
SELECT
user_pseudo_id,
MIN(event_timestamp) AS min_time,
MAX(event_timestamp) AS max_time,
event_name,
par.value.int_value AS engagement_time
FROM
`firebase-public-project.analytics_153293282.events_20180815`,
UNNEST(event_params) as par
WHERE
event_name = "user_engagement" AND par.key = "engagement_time_msec"
GROUP BY
user_pseudo_id,
event_name,
par.value.int_value
ANNEX
migration_script.sql
:
SELECT
@date AS event_date,
event.timestamp_micros AS event_timestamp,
event.previous_timestamp_micros AS event_previous_timestamp,
event.name AS event_name,
event.value_in_usd AS event_value_in_usd,
user_dim.bundle_info.bundle_sequence_id AS event_bundle_sequence_id,
user_dim.bundle_info.server_timestamp_offset_micros as event_server_timestamp_offset,
(
SELECT
ARRAY_AGG(STRUCT(event_param.key AS key,
STRUCT(event_param.value.string_value AS string_value,
event_param.value.int_value AS int_value,
event_param.value.double_value AS double_value,
event_param.value.float_value AS float_value) AS value))
FROM
UNNEST(event.params) AS event_param) AS event_params,
user_dim.first_open_timestamp_micros AS user_first_touch_timestamp,
user_dim.user_id AS user_id,
user_dim.app_info.app_instance_id AS user_pseudo_id,
"" AS stream_id,
user_dim.app_info.app_platform AS platform,
STRUCT( user_dim.ltv_info.revenue AS revenue,
user_dim.ltv_info.currency AS currency ) AS user_ltv,
STRUCT( user_dim.traffic_source.user_acquired_campaign AS name,
user_dim.traffic_source.user_acquired_medium AS medium,
user_dim.traffic_source.user_acquired_source AS source ) AS traffic_source,
STRUCT( user_dim.geo_info.continent AS continent,
user_dim.geo_info.country AS country,
user_dim.geo_info.region AS region,
user_dim.geo_info.city AS city ) AS geo,
STRUCT( user_dim.device_info.device_category AS category,
user_dim.device_info.mobile_brand_name,
user_dim.device_info.mobile_model_name,
user_dim.device_info.mobile_marketing_name,
user_dim.device_info.device_model AS mobile_os_hardware_model,
@platform AS operating_system,
user_dim.device_info.platform_version AS operating_system_version,
user_dim.device_info.device_id AS vendor_id,
user_dim.device_info.resettable_device_id AS advertising_id,
user_dim.device_info.user_default_language AS language,
user_dim.device_info.device_time_zone_offset_seconds AS time_zone_offset_seconds,
IF(user_dim.device_info.limited_ad_tracking, "Yes", "No") AS is_limited_ad_tracking ) AS device,
STRUCT( user_dim.app_info.app_id AS id,
@firebase_app_id AS firebase_app_id,
user_dim.app_info.app_version AS version,
user_dim.app_info.app_store AS install_source ) AS app_info,
(
SELECT
ARRAY_AGG(STRUCT(user_property.key AS key,
STRUCT(user_property.value.value.string_value AS string_value,
user_property.value.value.int_value AS int_value,
user_property.value.value.double_value AS double_value,
user_property.value.value.float_value AS float_value,
user_property.value.set_timestamp_usec AS set_timestamp_micros ) AS value))
FROM
UNNEST(user_dim.user_properties) AS user_property) AS user_properties
FROM
`SCRIPT_GENERATED_TABLE_NAME`,
UNNEST(event_dim) AS event