I have a table like the following (example here):
CREATE TABLE topics (
name varchar(64),
url varchar(253),
statistic integer,
pubdate timestamp
);
INSERT INTO topics VALUES
('a', 'b', 100, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'c', 110, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'd', 120, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'e', 90, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'f', 80, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'g', 70, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'h', 150, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'i', 50, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'j', 10, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'k', 11, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'l', 12, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'm', 9, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'n', 8, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'o', 7, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'p', 15, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'q', 5, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'r', 2, TIMESTAMP '2014-04-01 00:00:00')
I would like to take the top two rows based on their statistic
value _from each (name, date(pubdate)
) combination.
In other words, I would like to GROUP BY name, date(pubdate)
, but without an aggregate function, and instead with the result simply taking the top two rows based on their statistic
from each group. (So, I know it is not really a GROUP BY
, but rather a greatest-n-per-group
.)
I'm using Google Big Query with Standard SQL. I've looked at a number of other solutions but am unsure how to achieve the result in this case.
Desired result:
name url statistic date
a b 100 2011-05-16
a e 90 2011-05-16
a h 150 2014-04-01
a d 120 2014-04-01
b m 9 2011-05-16
b k 11 2011-05-16
b l 12 2014-04-01
b p 15 2014-04-01