Get the 2 options with min value for each each student_id

Question

I have table name m_option:

m_option_id  m_student_id  value
1             1             5
2             1             5
3             1             6
4             1             7
5             2             1
6             2             2
7             2             3
8             2             3
9             2             4

I want to get the 2 rows with min value for each m_student_id:

m_option_id  m_student_id  value
1             1             5
2             1             5
5             2             1
6             2             2

redneb · Accepted Answer · 2016-09-18T01:37:14.083

You can use the row_number window function for that:

SELECT m_option_id, m_student_id, value
FROM (
    SELECT
        m_option_id, m_student_id, value,
        row_number() OVER (PARTITION BY m_student_id ORDER BY value)
    FROM m_option
) t
WHERE
    row_number <= 2;

row_number will calculate the number of each row within its group. We then use that number to filter the the top 2 rows (i.e. lowest value) from each group.

Alternatively, you could use a LATERAL subquery:

SELECT m_option_id, m_student_id, value
FROM (SELECT DISTINCT m_student_id FROM m_option) s,
     LATERAL (
         SELECT m_option_id, value
         FROM m_option
         WHERE s.m_student_id=m_student_id
         ORDER BY value
         LIMIT 2
     ) t;

This will go through all distinct values of m_student_id and for each one of them will find the top 2 rows using a LATERAL subquery.

score 1 · Answer 2 · edited May 23 '17 at 12:02

Assuming there can be many rows per student in table m_option, the key to performance is index usage. And that's most efficient if you have a separate student table listing all students uniquely (which you would typically have). Then:

SELECT m.m_option_id, s.student_id AS m_student_id, m.value
FROM   student s
    ,  LATERAL (
   SELECT m_option_id, value
   FROM   m_option
   WHERE  m_student_id = s.student_id  -- PK of table student
   ORDER  BY value
   LIMIT  2
   ) m;

A multicolumn index on m_option makes this fast:

CREATE INDEX m_option_combo_idx ON m_option (m_student_id, value);

If you can get index-only scans out of it, append the column m_option_id as last index item:

CREATE INDEX m_option_combo_idx ON m_option (m_student_id, value, m_option_id)

Index columns in this order.

Is a composite index also good for queries on the first field?

Distilling a unique list of student_id from m_option would incur an expensive sequential scan over m_option and void any performance benefit.

This excludes students without any related rows in m_option. Use LEFT JOIN LATERAL () ON true to include such students in the result (extended with NULL values for the missing option):

What is the difference between LATERAL and a subquery in PostgreSQL?

If you do not have a student table, the other fast option is a recursive CTE.
Detailed explanation for either variant:

Optimize GROUP BY query to retrieve latest record per user

SQL92 has been out for a while, time to get rid of the old style joins? — Andomar, Sep 18 '16 at 10:15
@Andomar: The comma is [documented](https://www.postgresql.org/docs/current/static/sql-select.html) short notation for `CROSS JOIN` (binding less tightly). Nothing wrong with it. Explicit join conditions are preferable over generic predicates in the `WHERE` clause. But that's moot without join condition. — Erwin Brandstetter, Sep 18 '16 at 12:49
I agree that `cross join` is the case where the old style comma syntax has few disadvantages. But I don't think that justifies keeping SQL89 around. Mixing two styles is inconsistent and confusing for people learning SQL. — Andomar, Sep 18 '16 at 19:00

Get the 2 options with min value for each each student_id

2 Answers2