1

I am attempting to find the top n records when grouped by multiple attributes. I believe it is related to this problem, but I am having difficulty adapting the solution described to my situation.

To simplify, I have a table with columns (did is short for device_id):

id int
did int
dateVal dateTime

I am trying to find the top n device_id's for each day with the most rows.

For example (ignoring id and the time part of dateTime),

did dateVal
1   2017-01-01
1   2017-01-01
1   2017-01-01
2   2017-01-01
3   2017-01-01
3   2017-01-01

1   2017-01-02
1   2017-01-02
2   2017-01-02
2   2017-01-02
2   2017-01-02
3   2017-01-02

Finding the top 2 would yield...

1, 2017-01-01
3, 2017-01-01
2, 2017-01-02
1, 2017-01-02

My current naive approach is only giving me the top 2 across all dates.

--Using SQLite
select date(dateVal) || did 
from data 
group by date(dateVal), did
order by count(*) desc 
limit 2

I'm using the concatenation operator so that I can later extract the rows.

I am using SQLite, but any general SQL explanation would be appreciated.

peterdn
  • 2,386
  • 1
  • 23
  • 24
Cyruno
  • 13
  • 4

2 Answers2

3

Similarly to this question, define a CTE that computes all device counts for your desired groups, then use it in a WHERE ... IN subquery, limited to the top 2 devices for that date:

WITH device_counts AS (
  SELECT did, date(dateval) AS dateval, COUNT(*) AS device_count
  FROM data
  GROUP BY did, date(dateval)
)
SELECT did, date(dateval) FROM device_counts DC_outer
WHERE did IN (
  SELECT did
  FROM device_counts DC_inner
  WHERE DC_inner.dateval = DC_outer.dateval
  GROUP BY did, date(dateval)
  ORDER BY DC_inner.device_count DESC LIMIT 2
)
ORDER BY date(dateval), did
peterdn
  • 2,386
  • 1
  • 23
  • 24
  • Finally found time to try this method. Fairly certain the volume of data i'm handling is causing the query to hang. I'll leave it running overnight. In the meantime, is it possible to optimize the query somehow? Like maybe not using the date() casting function? – Cyruno Dec 09 '17 at 18:37
  • Possibly. An index on `(did, date)` would probably help as well. Hard to say without knowing the # rows per table and what indexes you already have. – peterdn Dec 10 '17 at 00:59
-1

I tested the query using sql server

select top 2 did, dateVal
from (select *, count(*) as c
      from test
      group by did,dateVal) as t
order by t.c desc 
CL.
  • 173,858
  • 17
  • 217
  • 259
Issaka Faisal
  • 267
  • 3
  • 5