2

I have data which contains user, eventdate and sessions.I want to separate users who had atleast 3 sessions and came back for new session within 3 days.

    user  eventdate   session
    A      2018-02-05   1
    A      2018-02-05   2
    A      2018-02-06   3 
    A      2018-02-10   4

The output the users who had done 3 sessions and then came back for forth session within 3 days.

I tried the following query but it is not giving me the answer that is needed.

 SELECT distinct user, MIN(eventdate) startdate, MAX(eventdate) enddate
FROM (SELECT user, eventdate
      FROM (SELECT user, eventdate
              FROM tablename
             where datediff(startdate,enddate)<=3
             ORDER BY user, eventdate) where sessions>=3) t
 GROUP BY user
 ORDER BY user, startdate;

I know the query has many issues but I am simply unable to figure out how to move forward. Any suggestions?

VSR
  • 87
  • 2
  • 18

2 Answers2

3

Below is for BigQuery Standard SQL

#standardSQL
SELECT *
FROM (
  SELECT 
    user, eventdate, sessions_in_a_day, 
    SUM(sessions_in_a_day) OVER(PARTITION BY user ORDER BY eventdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) total_sessions_before, 
    DATE_DIFF(eventdate, LAG(eventdate) OVER(PARTITION BY user ORDER BY eventdate), DAY) delay
  FROM (
    SELECT user, eventdate, COUNT(1) sessions_in_a_day 
    FROM t
    GROUP BY user, eventdate
  )
)
WHERE total_sessions_before >= 3
AND delay <= 3
-- ORDER BY user, eventdate

You can test / play with above using dummy data

#standardSQL
WITH t AS (
  SELECT 'A' user, DATE '2018-02-05' eventdate, 1 session UNION ALL
  SELECT 'A', DATE '2018-02-05', 2 UNION ALL
  SELECT 'A', DATE '2018-02-06', 3 UNION ALL
  SELECT 'A', DATE '2018-02-06', 4 UNION ALL
  SELECT 'A', DATE '2018-02-09', 5 UNION ALL
  SELECT 'A', DATE '2018-02-09', 6 UNION ALL
  SELECT 'A', DATE '2018-02-10', 7 UNION ALL 
  SELECT 'A', DATE '2018-02-13', 8 
)
SELECT *
FROM (
  SELECT 
    user, eventdate, sessions_in_a_day, 
    SUM(sessions_in_a_day) OVER(PARTITION BY user ORDER BY eventdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) total_sessions_before, 
    DATE_DIFF(eventdate, LAG(eventdate) OVER(PARTITION BY user ORDER BY eventdate), DAY) delay
  FROM (
    SELECT user, eventdate, COUNT(1) sessions_in_a_day 
    FROM t
    GROUP BY user, eventdate
  )
)
WHERE total_sessions_before >= 3
AND delay <= 3
ORDER BY user, eventdate  

result is

Row user    eventdate   sessions_in_a_day   total_sessions_before   delay    

1   A       2018-02-09  2                   4                       3    
2   A       2018-02-10  1                   6                       1    
3   A       2018-02-13  1                   7                       3    

Playing with WHERE clause you can "tune" to whatever case you need
In example above, you show only users who had at least 3 sessions before they reached next session within next 3 days If you are interested in only those who had exactly 3 sessions and reached their fourth session - you can add respective filter

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0
WITH Sess AS
(
select user, session
from tablename
group by  user
HAVING count(session) >= 3
)

select user
from tablename join Sess on tablename.session = Sess.session
group by user
having (datediff(day, min(eventdate), Max(eventdate)) <=3) 
and (min(eventdate) <> Max(eventDate))
DrHouseofSQL
  • 550
  • 5
  • 16