Here is result how I want to have
This is the Bigquery according old schema and is working fine in old data source table. Here is the query according to old schema. This is supposed to give to session-duration wise distribution which means it will give session-duration(seconds) wise number of sessions(Ex. it is in the image bleow).
select (case when engagement_time1 > 0 and engagement_time1 <= 30 then "0-30" else
(case when engagement_time1 > 30 and engagement_time1 <= 60 then "31-60" else
(case when engagement_time1 > 60 and engagement_time1 <= 180 then "61-180" else
(case when engagement_time1 > 180 and engagement_time1 <= 300 then "181-300" else
(case when engagement_time1 > 300 and engagement_time1 <= 600 then "301-600" else
(case when engagement_time1 > 600 and engagement_time1 <= 1800 then "601-1800" else
(case when engagement_time1 > 1800 then "1800+" else
"0" end) end) end) end) end) end) end)
as engagement_bracket_in_seconds,
count(*) total_sessions
, sum(engagement_time1) total_engagement_time
from
(
SELECT app_instance_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, MAX(sess_id) OVER(PARTITION BY app_instance_id) total_sessions,
(ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds,
sum(case when name = "user_engagement" then engagement_time else 0 end)/1000 engagement_time1
FROM (
SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
FROM (
SELECT *, IF(
previous IS null
OR (min_time-previous)>(30*60*1000*1000), # sessions broken by this inactivity
1, 0) session_start
#https://blog.modeanalytics.com/finding-user-sessions-sql/
FROM (
SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
FROM (
SELECT user_dim.app_info.app_instance_id
, (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
, (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
event.name,
params.value.int_value engagement_time
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`,
UNNEST(event_dim) as event,
UNNEST(event.params) as params,
UNNEST(user_dim.user_properties) as user_params
where (event.name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"
)
)
)
)
GROUP BY 1, 2
) where sess_id > 0
group by 1
ORDER BY (total_engagement_time/total_sessions)
Result of the old query is right and it is in the figure below
AS Bigquery schema has been changed so I tried to replicate the old query according to new schema and I also changed data source table also(which I am sure that all events is getting logged in that table ).
Now the lines of code I added (according to new schema) replacing some part of the old query and I am running it as a whole it's also working.
* added part(acoridng to new schema) *
SELECT L.user_pseudo_id as app_instance_id ,R.min_time min_time,R.max_time max_time,L.event_name,L.eng_time engagement_time FROM
(SELECT user_pseudo_id
,event_name
,params.value.int_value eng_time
FROM `new_abc_datasource`,
UNNEST(event_params) as params,
UNNEST(user_properties) as user_params
where (event_name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
) as L
left join (SELECT user_pseudo_id
, MIN(event_timestamp) AS min_time
,MAX(event_timestamp) AS max_time
FROM `analytics_151475732.events_*`,
UNNEST(event_params) as params,
UNNEST(user_properties) as user_params
where (event_name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
GROUP BY user_pseudo_id) as R
ON L.user_pseudo_id=R.user_pseudo_id
* replace part of the old query *
SELECT user_dim.app_info.app_instance_id
, (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
, (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
event.name,
params.value.int_value engagement_time
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`,
UNNEST(event_dim) as event,
UNNEST(event.params) as params,
UNNEST(user_dim.user_properties) as user_params
where (event.name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"
The problem in the new result, number of total_sessions reduced drastically. could someone please guide me saying , Is this right replication of the old query? because result I am getting is different from the result that old used to give.