1

I have a table pop with 3 columns: id,level,value:

CREATE TABLE IF NOT EXISTS `pop` (
  `id` int(6),
  `level` varchar(3),
  `value` int(6),
  PRIMARY KEY (`id`,`level`)
) DEFAULT CHARSET=utf8;
INSERT INTO `pop` (`id`, `level`, `value`) VALUES
  ('1', 'MUN', '243'),
  ('1', 'CAN', '335'),
  ('1', 'PRO', '345'),
  ('2', 'DIS', '345'),
  ('2', 'MUN', '243'),
  ('3', 'DIS', '335');

There are ~10M ids, and only 4 levels. All id's have at least 1 level. The levels can be "ordered", even if they are strings: ("DIS"<"MUN"<"CAN"<"PRO").

What I would like to achieve is:

  • for each id, give me the lowest level possible, and its value.

So the expected output is:

  1,'MUN',243
  2,'DIS',345
  3,'DIS',335

I have tried with

SELECT id,
CASE 
when level = 'DIS' THEN level
when level = 'MUN' THEN level 
when level = 'CAN' THEN level
when level = 'PRO' THEN level
END as level,
value FROM `pop` order by id, level

But it's not working, it returns the whole table.

Any ideas?

(Btw I am using SQL in AWS Athena)

ElTitoFranki
  • 375
  • 1
  • 7
  • Does this answer your question? [Get top 1 row of each group](https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group) – ahmed Jun 16 '23 at 10:46
  • Not really because the order of the levels could be different from the alphabetical order, i.e. instead of A,B,C,D levels could be "PRO", "DIS", "MUN", "CAN", with "DIS"<"MUN"<"CAN"<"PRO", so there's not an easy way to obtain the expected outcome in that case. I will edit the question. – ElTitoFranki Jun 16 '23 at 10:51
  • According to what criteria you have `"DIS"<"MUN"<"CAN"<"PRO"`? – ahmed Jun 16 '23 at 10:54
  • I that case, DIS corresponds to District, MUN to Municipality, CAN to Canton, PRO to Province, so a Province contains Cantons that contain Municipalities that contain Districts. does this clarify your question? – ElTitoFranki Jun 16 '23 at 10:56
  • So you have only these 4 levels and you want them in that order? – ahmed Jun 16 '23 at 10:58
  • Yes exactly, that's my case – ElTitoFranki Jun 16 '23 at 10:59
  • You could use the row_number approach with a case expression in the order by clause to specify that order. – ahmed Jun 16 '23 at 11:03
  • Could you give an example of the query please? – ElTitoFranki Jun 16 '23 at 11:08

1 Answers1

1

You could use the row_number function with a case expression in the order by clause to specify the order for levels as the following:

select id, level, value
from
(
  select *,
    row_number() over (partition by id 
          order by 
             case level 
                 when 'DIS' then 1 
                 when 'MUN' then 2 
                 when 'CAN' then 3 
                 when 'PRO' then 4 
              end) rn
  from pop
) t
where rn = 1

demo

ahmed
  • 9,071
  • 3
  • 9
  • 22