16

I have two tables, in PostgreSQL if that matters, with one to many relations. I need to join them so that for each "one" I only get single result from the "many" table. Not only that but I need to single out specific results from the "many" table.

TABLE_A
ID  | NAME      | DATE          | MORE COLS....
1   | JOHN      | 2012-01-10    | ....
2   | LIZA      | 2012-01-10    | ....
3   | ANNY      | 2012-01-10    | ....
4   | JAMES     | 2012-01-10    | ....
...

TABLE_B
ID  | CODE1     | CODE2     | SORT
1   | 04020     | 85003     | 1
1   | 04030     | 85002     | 4
2   | 81000     | 80703     | 1
3   | 87010     | 80102     | 4
3   | 87010     | 84701     | 5
4   | 04810     | 85003     | 1
4   | 04030     | 85002     | 4
4   | 04020     | 85003     | 1
...

QUERY RESULT
ID  | NAME      | DATE          | CODE1     | CODE2
1   | JOHN      | 2012-01-10    | 04020     | 85003
2   | LIZA      | 2012-01-10    | 81000     | 80703
3   | ANNY      | 2012-01-10    | 87010     | 80102
4   | JAMES     | 2012-01-10    | 04810     | 85003
...

The SORT column in TABLE_B is actually the last char in CODE2 reordered. CODE2 can end with 1-9 but 3 is most important then 5, 7, 4, 2, 1, 0, 6, 8, 9 hence 3-->1, 5-->2, 7-->3 and so forth.

The problem I'm facing is that I need the row from TABLE_B where sort is the lowest number. In some cases there are multiple lowest case (see ID=4 in TABLE_B) then it doesn't matter which of the rows with lowest ID are selected, only that there is single result for that ID.

John Woo
  • 258,903
  • 69
  • 498
  • 492
thorgilsv
  • 323
  • 1
  • 3
  • 11
  • 1
    Welcome to StackOverflow! Thanks for showing data and writing a clear question. Next time, though, you will make life easier on those who want to help you if you show your data in terms of a `CREATE TABLE` statements and `INSERT` or `COPY` statements to load the data. That way people can easily test candidate answers before they post, to make sure they don't have syntax errors and are getting the results you want. – kgrittn Sep 18 '12 at 03:48
  • Thanks for your comment. I will do that from now on. – thorgilsv Jan 30 '14 at 21:49

3 Answers3

18

Simpler, shorter, faster with PostgreSQL's DISTINCT ON:

SELECT DISTINCT ON (a.id)
       a.id, a.name, a.date, b.code1, b.code2
FROM   table_a a
LEFT   JOIN table_b b USING (id)
ORDER  BY a.id, b.sort;

Details, explanation, benchmark and links:

I use a LEFT JOIN, so that rows from table_a without any matching row in table_b are not dropped. Resulting NULL values in b.sort sort last by default.

Aside:
While being allowed in Postgres, it's unwise to use date as column name. It's a reserved word in every SQL standard and a type name in Postgres.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you, this solution works too. The naming of the columns was solely for descriptive reasons explicitly for this example. – thorgilsv Sep 18 '12 at 12:19
  • 1
    @thorgilsv: I understand. This version should be quite a bit faster. You can test with [`EXPLAIN ANALYZE`](http://www.postgresql.org/docs/current/interactive/sql-explain.html). – Erwin Brandstetter Sep 18 '12 at 12:25
7

PostgreSQL supports window function. Try this,

SELECT d.ID, d.NAME, d.DATE, d.CODE1, d.CODE2
FROM
(
  SELECT  a.ID, a.NAME, a.DATE,
          b.CODE1, b.CODE2,
          ROW_NUMBER() OVER(PARTITION BY a.ID ORDER BY b.SORT ASC, b.CODE2 DESC) AS ROWNUM
  FROM    TableA a
          INNER JOIN TableB b
            ON a.ID = b.ID
) d
WHERE d.RowNum = 1

SQLFiddle Demo

John Woo
  • 258,903
  • 69
  • 498
  • 492
2

Here's what I'd do on SQL Server.

SELECT a.ID,
    a.NAME,
    a.DATE,
    b.CODE1,
    b.CODE2
FROM TABLE_A a
JOIN TABLE_B b
    on a.ID = b.ID
WHERE b.SORT = (SELECT MIN(SORT) 
    FROM TABLE_B
    WHERE ID = b.ID)
Bacon Bits
  • 30,782
  • 5
  • 59
  • 66