To select N records per category one can do:
SELECT category, category_id, value FROM
(
SELECT category, value, row_number() OVER (PARTITION by category) as category_id
FROM myTable
)
WHERE category_id < N;
The inner SELECT will first partition the records per category and assign each record per category an id called category_id. The outer query will then use the category_id to limit the number of records it queries per category.
This is extremely inefficient on BIG tables as it will be going through assigning ids to all the records even though we are only interested in N records per category.
The following does not work on the sql engine that I am working with - not sure if it works on any engine at all.
SELECT category, value, row_number() OVER (PARTITION by category) as category_id
FROM myTable
WHERE category_id < N
Does anyone know of any other ways of achieving this with a better time complexity?
More thoughts:
Time profiling the following algorithm against above query might provide more insights as to how the query runs behind the scene:
1. SELECT DISTINCT(category) FROM myTable
2. FOREACH category SELECT N rows
more info:
my data is physically partitioned by category
, being able to explicitly leverage that would be useful