Select top most non-duplicated entry after ordering by other columns

Question

I would like to select the "top most" entry for each row with a duplicated column value.

Performing the following query -

SELECT *
FROM shop
ORDER BY shop.start_date DESC, shop.created_date DESC;

I get the result set -

+--------+---------+------------+--------------+
| row_id | shop_id | start_date | created_date |
+--------+---------+------------+--------------+
| 1      | 1       | 2017-02-01 | 2017-01-01   |
| 2      | 1       | 2017-01-01 | 2017-02-01   |
| 3      | 2       | 2017-01-01 | 2017-07-01   |
| 4      | 2       | 2017-01-01 | 2017-01-01   |
+--------+---------+------------+--------------+

Can I modify the SELECT so that I only get back the "top rows" for each unique shop_id -- in this case, row_ids 1 and 3. There can be 1..n number of rows with the same shop_id.

Similarly, if my query above returned the following order, I'd want to only SELECT row_ids 1 and 4 since those would be the "top most" entries each shop_id.

+--------+---------+------------+--------------+
| row_id | shop_id | start_date | created_date |
+--------+---------+------------+--------------+
| 1      | 1       | 2017-02-01 | 2017-01-01   |
| 2      | 1       | 2017-01-01 | 2017-02-01   |
| 4      | 2       | 2017-01-01 | 2017-07-01   |
| 3      | 2       | 2017-01-01 | 2017-01-01   |
+--------+---------+------------+--------------+

row 2 has different dates to row 1 (so row 2 is NOT a duplicate of row 1), why is row 2 excluded? — Paul Maxwell, Dec 21 '17 at 02:03
@potashin I would normally make this assumption, and there is no penalty for not making it. — Tim Biegeleisen, Dec 21 '17 at 02:03
@TimBiegeleisen: there is actually - the code being superfluous at least. also the assumption can't be confirmed by the sample data, so it is better to ask — potashin, Dec 21 '17 at 02:07
@potashin Data can change over time, and I can readily imagine making this assumption and the query breaking later on. But good on you for thinking of a reduced way to write the query. — Tim Biegeleisen, Dec 21 '17 at 02:08
@potashin `row_id` is the PK on the table. If I understand your question correctly, the same `row_id` cannot be tied to a different `shop_id` — macbombe, Dec 21 '17 at 02:13

potashin · Accepted Answer · 2017-12-23T00:50:07.143

0

You can do this by using a subquery:

select s.* 
from shop s 
where s.row_id = (
  select row_id 
  from shop 
  where shop_id = s.shop_id 
  order by start_date desc, created_date desc 
  limit 1
)

Mind the assumption of row_id being uniq for each shop_id in this query example.

Demonstration

Or like this:

select t.*
from shop t
join (
 select t2.shop_id, t2.start_date, max(t2.created_date) as created_date 
 from shop t2
 join (
   select max(start_date) as start_date, shop_id
   from shop
   group by shop_id
 ) t3 on t3.shop_id = t2.shop_id and t3.start_date = t2.start_date
 group by t2.shop_id, t2.start_date
) t1 on t1.shop_id = t.shop_id and t.start_date = t1.start_date and t.created_date = t1.created_date

Mind that in case there can be records with the same start_date and created_date for the same shop_id you would need to use another group by s.shop_id, s.start_date, s.created_date in the outer query (adding min(row_id) with other columns listed in the group by in select)

Demonstration

edited Dec 23 '17 at 00:50

answered Dec 21 '17 at 01:51

potashin

44,205
11
83
107

I added a slightly different example that I cannot get to work with your demo. – macbombe Dec 21 '17 at 02:27
@macbombe: updated – potashin Dec 21 '17 at 02:32
Is it possible to do this without using limit? – macbombe Dec 22 '17 at 22:56
@macbombe: yes, it is – potashin Dec 23 '17 at 00:50

score 0 · Answer 2 · answered Dec 21 '17 at 01:54

Try joining to a subquery which finds the "top" rows for each shop_id:

SELECT t1.*
FROM shop t1
INNER JOIN
(
    SELECT shop_id, MIN(row_id) AS min_id
    FROM shop
    GROUP BY shop_id
) t2
    ON t1.shop_id = t2.shop_id AND
       t1.row_id = t2.min_id
ORDER BY
    t1.start_date DESC,
    t1.created_date DESC;

Select top most non-duplicated entry after ordering by other columns

2 Answers2

Demo