5

Given a table in Google BigQuery:

User  Timestamp 
A     TIMESTAMP(12/05/2015 12:05:01.8023)
B     TIMESTAMP(9/29/2015 12:15:01.0323)
B     TIMESTAMP(9/29/2015 13:05:01.0233)
A     TIMESTAMP(9/29/2015 14:05:01.0432)
C     TIMESTAMP(8/15/2015 5:05:01.0000)
B     TIMESTAMP(9/29/2015 14:06:01.0233)
A     TIMESTAMP(9/29/2015 14:06:01.0432)

Is there a simple way to compute:

User  Maximum_Number_of_Events_this_User_Had_in_One_Hour  
A     2
B     3
C     1

where the time window of one hour is a parameter?

I tried doing this myself using a combination of LAG and partition functions by building off of these two questions:

BigQuery SQL for 28-day sliding window aggregate (without writing 28 lines of SQL)

Bigquery SQL for sliding window aggregate

But find those posts are too dissimilar since I am not finding number of people per time window, but instead finding max number of events per person within a time window.

Community
  • 1
  • 1
cgnorthcutt
  • 3,890
  • 34
  • 41

3 Answers3

7

Here is an efficient succinct way to do it that exploits the ordered structure of timestamps.

SELECT
  user,
  MAX(per_hour) AS max_event_per_hour
FROM
(
  SELECT 
    user,
    COUNT(*) OVER (PARTITION BY user ORDER BY timestamp RANGE BETWEEN 60 * 60 * 1000000 PRECEDING AND CURRENT ROW) as per_hour,
    timestamp
  FROM 
    [dataset_example_in_question_user_timestamps]
)
GROUP BY user
cgnorthcutt
  • 3,890
  • 34
  • 41
  • great use of RANGE. you still need to get MAX though. And one more - timestamp in your example table is sting so you need to cast it to timestamp and then to sec or msec for above to work. Still really very nice! – Mikhail Berlyant Sep 30 '15 at 22:42
2

Try below for GBQ. Haven't tested much, but looks workable to me

SELECT
  User, Max(events) as Max_Events
FROM (
  SELECT 
    b.User as User, 
    b.Timestamp as Timestamp,
    COUNT(1) as Events
  FROM [your_dataset.your_table] as b
  JOIN (
    SELECT User, Timestamp 
    FROM [your_dataset.your_table]
    ) as w 
  ON w.User = b.User
  WHERE ROUND((TIMESTAMP_TO_SEC(TIMESTAMP(w.Timestamp)) - 
               TIMESTAMP_TO_SEC(TIMESTAMP(b.Timestamp))) / 3600, 1) BETWEEN 0 AND 1
  GROUP BY 1, 2
)
GROUP BY 1
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • Thank you for your answer - but I came up with a better one so I'm going to post that. I'll let you look at though before I accept my own answer - see what you think! – cgnorthcutt Sep 30 '15 at 22:19
1

I think you can use a query like this (In T-SQL):

SELECT "User", SUM(s) As Maximum_Number_of_Events_this_User_Had_in_One_Hour  
FROM (
    SELECT "User", 1 s
    FROM yourTable
    GROUP BY "User", CAST("Timestamp" As date), DATEPART(Hour, "Timestamp")) As t
GROUP BY "User"
shA.t
  • 16,580
  • 5
  • 54
  • 111