1

Here is result how I want to have

This is the Bigquery according old schema and is working fine in old data source table. Here is the query according to old schema. This is supposed to give to session-duration wise distribution which means it will give session-duration(seconds) wise number of sessions(Ex. it is in the image bleow).

 select (case when engagement_time1 > 0 and engagement_time1 <= 30 then "0-30" else
(case when engagement_time1 > 30 and engagement_time1 <= 60 then "31-60" else 
(case when engagement_time1 > 60 and engagement_time1 <= 180 then "61-180" else
(case when engagement_time1 > 180 and engagement_time1 <= 300 then "181-300" else
(case when engagement_time1 > 300 and engagement_time1 <= 600 then "301-600" else
(case when engagement_time1 > 600 and engagement_time1 <= 1800 then "601-1800" else 
(case when engagement_time1 > 1800 then "1800+" else
"0" end) end) end) end) end) end) end) 
 as engagement_bracket_in_seconds,
count(*) total_sessions
, sum(engagement_time1) total_engagement_time
from
(
SELECT app_instance_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, MAX(sess_id) OVER(PARTITION BY app_instance_id) total_sessions,
   (ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds,
       sum(case when name = "user_engagement" then engagement_time else 0 end)/1000 engagement_time1

FROM (
  SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
  FROM (
    SELECT *, IF(
                previous IS null 
                OR (min_time-previous)>(30*60*1000*1000),  # sessions broken by this inactivity 
                1, 0) session_start 
                #https://blog.modeanalytics.com/finding-user-sessions-sql/
    FROM (
      SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
      FROM (
        SELECT user_dim.app_info.app_instance_id
          , (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
          , (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
                event.name,
                params.value.int_value engagement_time
        FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`,
        UNNEST(event_dim) as event,
        UNNEST(event.params) as params,
        UNNEST(user_dim.user_properties) as user_params
        where (event.name = "user_engagement" and params.key = "engagement_time_msec")
        and
                (user_params.key = "access" and user_params.value.value.string_value = "true") and
                PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
                PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"
      )
    )
  )
)
GROUP BY 1, 2

) where sess_id > 0
group by 1
ORDER BY (total_engagement_time/total_sessions)

Result of the old query is right and it is in the figure below

Click to see the result here

AS Bigquery schema has been changed so I tried to replicate the old query according to new schema and I also changed data source table also(which I am sure that all events is getting logged in that table ).

Now the lines of code I added (according to new schema) replacing some part of the old query and I am running it as a whole it's also working.

* added part(acoridng to new schema) *

SELECT L.user_pseudo_id as app_instance_id ,R.min_time min_time,R.max_time max_time,L.event_name,L.eng_time engagement_time FROM  
              (SELECT  user_pseudo_id 
                        ,event_name
                        ,params.value.int_value eng_time
                    FROM `new_abc_datasource`,
                    UNNEST(event_params) as params,
                    UNNEST(user_properties) as user_params
                    where (event_name = "user_engagement" and params.key = "engagement_time_msec")
                    and
                            (user_params.key = "access" and user_params.value.string_value = "true") and
                            PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
                            PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
               ) as L 
               left join (SELECT  user_pseudo_id 
                       , MIN(event_timestamp) AS min_time
                      ,MAX(event_timestamp) AS max_time
                    FROM `analytics_151475732.events_*`,
                    UNNEST(event_params) as params,
                    UNNEST(user_properties) as user_params
                    where (event_name = "user_engagement" and params.key = "engagement_time_msec")
                    and
                            (user_params.key = "access" and user_params.value.string_value = "true") and
                            PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
                            PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
                    GROUP BY user_pseudo_id) as R 
                ON L.user_pseudo_id=R.user_pseudo_id

* replace part of the old query *

 SELECT user_dim.app_info.app_instance_id 
          , (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
          , (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
                event.name,
                params.value.int_value engagement_time
        FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`,
        UNNEST(event_dim) as event,
        UNNEST(event.params) as params,
        UNNEST(user_dim.user_properties) as user_params
        where (event.name = "user_engagement" and params.key = "engagement_time_msec")
        and
                (user_params.key = "access" and user_params.value.value.string_value = "true") and
                PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
                PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"

The problem in the new result, number of total_sessions reduced drastically. could someone please guide me saying , Is this right replication of the old query? because result I am getting is different from the result that old used to give.

  • can you simplify both queries? just to focus on what's not working vs the full example – Felipe Hoffa Sep 21 '18 at 11:19
  • @FelipeHoffa sir, I edited some of the part as you asked to clarify. Could now please see and shed some light on it? – subhendu rana Sep 21 '18 at 12:26
  • to clarify, both query is working but after modifying some lines of code (according to new schema which is in the *added part(according to new schema)* section), the modified query as a whole is not giving result as expected. Total sessions decreased drastically than earliear version. – subhendu rana Sep 21 '18 at 12:29
  • @FelipeHoffa https://stackoverflow.com/questions/52454515/how-to-write-bigquery-in-new-schema-with-replacing-event-dim-in-old-schema-from/52516789?noredirect=1#comment92123066_52516789 this is where I cleared up the query I used here – subhendu rana Oct 03 '18 at 06:39
  • It would be helpful if you could provide the schemas or at least the modification made in the schema. – Nathan Nasser Nov 17 '18 at 04:58

0 Answers0