Greatest N Per Group with JOIN and multiple order columns

Question

I have two tables:

Table0:
| ID | TYPE | TIME  | SITE |
|----|------|-------|------|
| aa | 1    | 12-18 | 100  |
| aa | 1    | 12-10 | 101  |
| bb | 2    | 12-10 | 102  |
| cc | 1    | 12-09 | 100  |
| cc | 2    | 12-12 | 103  |
| cc | 2    | 12-01 | 109  |
| cc | 1    | 12-07 | 101  |
| dd | 1    | 12-08 | 100  |

and

Table1:
| ID |
|----|
| aa |
| cc |
| cc |
| dd |
| dd |

I'm trying to output results where:

ID must exist in both tables.
TYPE must be the maximum for each ID.
TIME must be the minimum value for the maximum TYPE for each ID.
SITE should be the value from the same row as the minimum TIME value.

Given my sample data, my results should look like this:

| ID | TYPE | TIME  | SITE |
|----|------|-------|------|
| aa | 1    | 12-10 | 101  |
| cc | 2    | 12-01 | 109  |
| dd | 1    | 12-08 | 100  |

I've tried these statements:

INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
       CASE WHEN MAS.MAB=1 THEN 'B'
            WHEN MAS.MAB=2 THEN 'F'
            ELSE NULL END,
       "TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
     TABLE0 AS MA,
     (SELECT "ID", MAX("TYPE") AS MASTY, MIN("TIME") AS MASTM 
      FROM TABLE0 
      GROUP BY "ID") AS MAS,
WHERE QTS."ID" = MA."ID"
      AND QTS."ID" = MAS."ID"
      AND MSD.MASTY  =MA."TYPE"

...which generates a syntax error

INSERT INTO "NuTable"
SELECT DISTINCT(QTS."ID"), "SITE",
       CASE WHEN MAS.MAB=1 THEN 'B'
            WHEN MAS.MAB=2 THEN 'F'
            ELSE NULL END,
       "TIME"
FROM (SELECT DISTINCT("ID") FROM TABLE1) AS QTS,
     TABLE0 AS MA,
     (SELECT "ID", MAX("TYPE") AS MAB 
      FROM TABLE0 
      GROUP BY "ID") AS MAS,
     ((SELECT "ID", MIN("TIME") AS MACTM, MIN("TYPE") AS MACTY 
       FROM TABLE0 
       WHERE "TYPE" = 1 
       GROUP BY "ID")  
      UNION
      (SELECT "ID", MIN("TIME"), MAX("TYPE") 
       FROM TABLE0 
       WHERE "TYPE" = 2 
       GROUP BY "ID")) AS MACU 
WHERE QTS."ID" = MA."ID"
      AND QTS."ID" = MAS."ID"
      AND MACU."ID" = QTS."ID"
      AND MA."TIME" = MACU.MACTM
      AND MA."TYPE" = MACU.MACTB

... which is getting the wrong results.

Your first query is generating a syntax error, but (unless PostgreSQL is doing something I'm not expecting) not the one you're stating. Side note: If at all possible, don't use quoted identifiers, because they cause other problems. Also, please don't use the implicit-join syntax (the comma-separated `FROM` clause) - always explicitly write out the `JOIN`s, and put the conditions into the relevant `ON` clauses. — Clockwork-Muse, Dec 23 '18 at 07:40
Possible duplicate of [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group). In particular, the PostgreSQL `DISTINCT ON` extension mentioned in the second answer should be all you need. — Clockwork-Muse, Dec 23 '18 at 07:48

Caius Jard · Answer 1 · 2018-12-23T06:53:30.757

Answering your direct question "how to avoid...":

You get this error when you specify a column in a SELECT area of a statement that isn't present in the GROUP BY section and isn't part of an aggregating function like MAX, MIN, AVG

in your data, I cannot say

SELECT
  ID, site, min(time)
FROM
  table
GROUP BY
  id

I didn't say what to do with SITE; it's either a key of the group (in which case I'll get every unique combination of ID,site and the min time in each) or it should be aggregated (eg max site per ID)

These are ok:

SELECT
  ID, max(site), min(time)
FROM
  table
GROUP BY
  id 

SELECT
  ID, site, min(time)
FROM
  table
GROUP BY
  id,site

I cannot simply not specify what to do with it- what should the database return in such a case? (If you're still struggling, tell me in the comments what you think the db should do, and I'll better understand your thinking so I can tell you why it can't do that ). The programmer of the database cannot make this decision for you; you must make it

Usually people ask this when they want to identify:

The min time per ID, and get all the other row data as well. eg "What is the full earliest record data for each id?"

In this case you have to write a query that identifies the min time per id and then join that subquery back to the main data table on id=id and time=mintime. The db runs the subquery, builds a list of min time per id, then that effectively becomes a filter of the main data table

SELECT * FROM
(
  SELECT
    ID, min(time) as mintime
  FROM
    table
  GROUP BY
    id
) findmin
INNER JOIN table t ON t.id = findmin.id and t.time = findmin.mintime

What you cannot do is start putting the other data you want into the query that does the grouping, because you either have to group by the columns you add in (makes the group more fine grained, not what you want) or you have to aggregate them (and then it doesn't necessarily come from the same row as other aggregated columns - min time is from row 1, min site is from row 3 - not what you want)

Looking at your actual problem:

The ID value must exist in two tables. The Type value must be largest group by id. The Time value must be smallest in the largest type group.

Leaving out a solution that involves having or analytics for now, so you can get to grips with the theory here:

You need to find the max type group by id, and then join it back to the table to get the other relevant data also (time is needed) for that id/maxtype and then on this new filtered data set you need the id and min time

SELECT t.id,min(t.time) FROM
(
  SELECT
    ID, max(type) as maxtype
  FROM
    table
  GROUP BY
    id
) findmax
INNER JOIN table t ON t.id = findmax.id and t.type = findmax.maxtype
GROUP BY t.id

If you can't see why, let me know

S-Man · Answer 2 · 2018-12-23T10:26:34.237

demo:db<>fiddle

SELECT DISTINCT ON (t0.id)
    t0.id,
    type,
    time,
    first_value(site) OVER (PARTITION BY t0.id ORDER BY time) as site
FROM table0 t0
JOIN table1 t1 ON t0.id = t1.id
ORDER BY t0.id, type DESC, time

ID must exist in both tables

This can be achieved by joining both tables against their ids. The result of inner joins are rows that exist in both tables.

SITE should be the value from the same row as the minimum TIME value.

This is the same as "Give me the first value of each group ofids ordered bytime". This can be done by using the first_value() window function. Window functions can group your data set (PARTITION BY). So you are getting groups of ids which can be ordered separately. first_value() gives the first value of these ordered groups.

TYPE must be the maximum for each ID.

To get the maximum type per id you'll first have to ORDER BY id, type DESC. You are getting the maximum type as first row per id...

TIME must be the minimum value for the maximum TYPE for each ID.

... Then you can order this result by time additionally to assure this condition.

Now you have an ordered data set: For each id, the row with the maximum type and its minimum time is the first one.

DISTINCT ON gives you exactly the first row of each group. In this case the group you defined is (id). The result is your expected one.

score 0 · Answer 3 · answered Dec 23 '18 at 13:18

0

I would write this using distinct on and in/exists:

select distinct on (t0.id) t0.*
from table0 t0
where exists (select 1 from table1 t1 where t1.id = t0.id)
order by t0.id, type desc, time asc;

answered Dec 23 '18 at 13:18

Gordon Linoff

1,242,037
58
646
786

The type become character varying in the end. Is it possible to order by the string? – finkfink Dec 23 '18 at 13:46
@fink168 . . . Of course you can order by a string. – Gordon Linoff Dec 23 '18 at 14:20

Greatest N Per Group with JOIN and multiple order columns

3 Answers3