0

I would like to select the "top most" entry for each row with a duplicated column value.

Performing the following query -

SELECT *
FROM shop
ORDER BY shop.start_date DESC, shop.created_date DESC;

I get the result set -

+--------+---------+------------+--------------+
| row_id | shop_id | start_date | created_date |
+--------+---------+------------+--------------+
| 1      | 1       | 2017-02-01 | 2017-01-01   |
| 2      | 1       | 2017-01-01 | 2017-02-01   |
| 3      | 2       | 2017-01-01 | 2017-07-01   |
| 4      | 2       | 2017-01-01 | 2017-01-01   |
+--------+---------+------------+--------------+

Can I modify the SELECT so that I only get back the "top rows" for each unique shop_id -- in this case, row_ids 1 and 3. There can be 1..n number of rows with the same shop_id.

Similarly, if my query above returned the following order, I'd want to only SELECT row_ids 1 and 4 since those would be the "top most" entries each shop_id.

+--------+---------+------------+--------------+
| row_id | shop_id | start_date | created_date |
+--------+---------+------------+--------------+
| 1      | 1       | 2017-02-01 | 2017-01-01   |
| 2      | 1       | 2017-01-01 | 2017-02-01   |
| 4      | 2       | 2017-01-01 | 2017-07-01   |
| 3      | 2       | 2017-01-01 | 2017-01-01   |
+--------+---------+------------+--------------+
macbombe
  • 3
  • 2
  • could there be same `row_id`s for different `shop_id`s? – potashin Dec 21 '17 at 01:57
  • row 2 has different dates to row 1 (so row 2 is NOT a duplicate of row 1), why is row 2 excluded? – Paul Maxwell Dec 21 '17 at 02:03
  • @potashin I would normally make this assumption, and there is no penalty for not making it. – Tim Biegeleisen Dec 21 '17 at 02:03
  • @TimBiegeleisen: there is actually - the code being superfluous at least. also the assumption can't be confirmed by the sample data, so it is better to ask – potashin Dec 21 '17 at 02:07
  • @potashin Data can change over time, and I can readily imagine making this assumption and the query breaking later on. But good on you for thinking of a reduced way to write the query. – Tim Biegeleisen Dec 21 '17 at 02:08
  • @potashin `row_id` is the PK on the table. If I understand your question correctly, the same `row_id` cannot be tied to a different `shop_id` – macbombe Dec 21 '17 at 02:13
  • @macbombe: yes, this is exactly what I was asking about – potashin Dec 21 '17 at 02:15

2 Answers2

0

You can do this by using a subquery:

select s.* 
from shop s 
where s.row_id = (
  select row_id 
  from shop 
  where shop_id = s.shop_id 
  order by start_date desc, created_date desc 
  limit 1
)   

Mind the assumption of row_id being uniq for each shop_id in this query example.

Demonstration

Or like this:

select t.*
from shop t
join (
 select t2.shop_id, t2.start_date, max(t2.created_date) as created_date 
 from shop t2
 join (
   select max(start_date) as start_date, shop_id
   from shop
   group by shop_id
 ) t3 on t3.shop_id = t2.shop_id and t3.start_date = t2.start_date
 group by t2.shop_id, t2.start_date
) t1 on t1.shop_id = t.shop_id and t.start_date = t1.start_date and t.created_date = t1.created_date

Mind that in case there can be records with the same start_date and created_date for the same shop_id you would need to use another group by s.shop_id, s.start_date, s.created_date in the outer query (adding min(row_id) with other columns listed in the group by in select)

Demonstration

potashin
  • 44,205
  • 11
  • 83
  • 107
0

Try joining to a subquery which finds the "top" rows for each shop_id:

SELECT t1.*
FROM shop t1
INNER JOIN
(
    SELECT shop_id, MIN(row_id) AS min_id
    FROM shop
    GROUP BY shop_id
) t2
    ON t1.shop_id = t2.shop_id AND
       t1.row_id = t2.min_id
ORDER BY
    t1.start_date DESC,
    t1.created_date DESC;

enter image description here

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360