How to pick a specific row from a group?

Question

I have a table with several columns:

id |  Source  |  Mode   | SponsorID | Code  | ...
1    source1    Paper     123        7102
2    source2    Paper     123        7102
3    source2    Paper     123        7101
4    source1    Paper     123        7101
5    source2    Paper     123        1010
6    source1    Paper     345        1010
7    source2    Paper     345        7102
8    source1    Paper     345        1010
9    source2    Paper     345        7102
10   source1    Paper     345        7102
11   source1    Paper     678        1010
12   source2    Paper     678        1010
13   source1    Paper     678        1010
14   source2    Paper     678        1010
15   source1    Paper     678        1010

I want to group the above records by SponsorID and only want to pick one record based on the value in code column.
In the SponsorID group I only want to pick the record that has code 7101. If that value does not exist in the code column I want to be able to pick the record with value 7102 in the code column. If 7102 does not exist I want to pick the record with 1010. So the final output should look something like:

1    source1    Paper     123        7101
2    source2    Paper     345        7102
3    source1    Paper     678        1010

I have tried using partitions and case but no success.
Would greatly appreciate if someone could help.

Why 'source1' for sponsor 123 where 'source2' comes "first", but 'source2' for sponsor 345? IOW: how to break ties? And what to do if neither of the three given codes exists (even if that *should* never happen)? — Erwin Brandstetter, Apr 06 '20 at 23:49
the code column must have a value. So it will either one of the three or a different value which has to be given a priority too. The source column is not used to pick the required order. The code 7101 can be sent by other sources by error and we are trying to eliminate that. — swetha katipalli, Apr 10 '20 at 19:13

score 0 · Answer 1 · answered Apr 06 '20 at 21:07

0

Use distinct on and some filtering and ordering:

select distinct on (SponsorID) t.*
from t
where code in (7101, 7102, 1010)
order by SponsorId, (case code when 7101 then 1 when 7102 then 2 else 3 end);

answered Apr 06 '20 at 21:07

Gordon Linoff

1,242,037
58
646
786

Erwin Brandstetter · Answer 2 · 2020-04-09T01:36:40.993

DISTINCT ON, like Gordon provided, is perfectly valid and simple. Basics:

Select first row in each GROUP BY group?

You might use a construct with unnest() and WITH ORDINALITY and only provide an array with codes sorted by your preference:

SELECT DISTINCT ON (SponsorID) t.*
FROM   unnest('{7101, 7102, 1010}'::int[]) WITH ORDINALITY x(code, ord)
JOIN   tbl t USING (code)
ORDER  BY t.SponsorID, x.ord;

See:

PostgreSQL unnest() with element number

A simple index helps performance if codes of interest are not too common:

CREATE INDEX ON tbl (code);

But your sample values suggest there are many rows per (SponsorID, code). So 2 things:

1. Correctness

Define which row to pick when there are multiple with code = 7101 etc. Something like "smallest id first". You can simply append another ORDER BY expressions to the query from above:

...
ORDER  BY t.SponsorID, x.ord, t.id;

But consider ...

2. Performance

Assuming there is a table sponsor with 1 row per sponsor (which you would typically have), consider this more sophisticated query:

SELECT t.*
FROM   sponsor s
CROSS  JOIN LATERAL (
             (SELECT * FROM tbl t WHERE t.SponsorID = s.id AND code = 7101 ORDER BY id LIMIT 1) -- ORDER BY id???
   UNION ALL (SELECT * FROM tbl t WHERE t.SponsorID = s.id AND code = 7102 ORDER BY id LIMIT 1)
   UNION ALL (SELECT * FROM tbl t WHERE t.SponsorID = s.id AND code = 1010 ORDER BY id LIMIT 1)
   LIMIT 1
   ) t
;

db<>fiddle here

In cooperation with this index:

CREATE INDEX ON tbl (SponsorID, code, id);

Now we get fast index scans, only reading rows we actually need.

If you don't care about "smallest id first", trim that from query and index.

There are even ways without a sponsor table, but we are getting in too deep ... See:

Optimize GROUP BY query to retrieve latest row per user

How to pick a specific row from a group?

2 Answers2

1. Correctness

2. Performance