0

I have 2 tables without identical primary key between them (id exist in both, primary only for table A). I want to use the primary key of the first table A for the ON clause. Therefore I'll have duplicates from the second table B. I want to GROUP BY the duplicates based on some field B.cnt and always take the first one - DESC LIMIT 1.

This is what I tried (DBMS is PostgreSQL):

SELECT 
    scheme1.A.some_attr, 
    B.some_attr
FROM 
    (SELECT * FROM scheme2.B ORDER BY scheme2.B.cnt DESC LIMIT 1) AS B
INNER JOIN
    scheme1.A
ON
    scheme1.A.id = B.id
;

The query returns single record. While the desired behavior is to return single record just for each set of records from B having same id (based on the criteria mentioned). So in total the query of course will return multiple records...

How can I achieve the desired result?

Thanks,

michael
  • 3,835
  • 14
  • 53
  • 90

2 Answers2

1

Your issue should be this line:

(SELECT * FROM scheme2.B ORDER BY scheme2.B.cnt DESC LIMIT 1) scheme2.B

It's being treated as following:

(SELECT * FROM scheme2.B ORDER BY scheme2.B.cnt DESC LIMIT 1) AS scheme2.B

Where scheme.B alias is obviously incorrect, change it following and it should work

SELECT 
    scheme1.A.some_attr, 
    scheme2.B.some_attr
FROM 
    (SELECT * FROM scheme2.B ORDER BY scheme2.B.cnt DESC LIMIT 1) AS B
INNER JOIN
    scheme1.A
ON
    scheme1.A.id = B.id
;

EDIT:

SELECT 
    scheme1.A.some_attr, 
    scheme2.B.some_attr
FROM 
    scheme1.A
LEFT JOIN LATERAL
    (SELECT * FROM scheme2.B WHERE scheme2.B.id = scheme2.A.id ORDER BY scheme2.B.cnt DESC LIMIT 1) AS B ON TRUE
;

If it's a single attribute, you could do the following:

SELECT 
    scheme1.A.some_attr, 
    (
        SELECT 
            scheme2.B.some_attr
        FROM
            scheme2.B
        WHERE
            scheme2.B.id = scheme2.A.id
        ORDER BY scheme2.B.cnt DESC LIMIT 1
    )
FROM 
    scheme1.A
;
Evaldas Buinauskas
  • 13,739
  • 11
  • 55
  • 107
  • It's solved the syntax error, but the query returns in total single record, which is not what I expected. I do want just single record out of each set of duplicate records from B having same id. I'l edit the question accordingly. – michael May 28 '17 at 08:31
  • 1
    I think you're looking for Lateral Join https://stackoverflow.com/questions/11472790/postgres-analogue-to-cross-apply-in-sql-server. I'll my answer accordingly. I'm not a PostgreSQL expert, but this seems like what you're looking for. – Evaldas Buinauskas May 28 '17 at 08:37
  • It worked and returned what seems to be correct number of records, however the attributes from the table B: scheme2.B.some_attr returned all NULLS. – michael May 28 '17 at 08:44
  • data is correct yes, the second option (inner select) works fine – michael May 28 '17 at 09:37
  • 1
    @michael seems that ` ON TRUE` statement has to be added, let me update, and check then :) – Evaldas Buinauskas May 28 '17 at 19:02
1

Use rank() window analytical function, see Postgres Window Functions

SELECT * FROM (
                SELECT          
                scheme1.A.some_attr, 
                scheme2.B.some_attr,                     
                rank() OVER (PARTITION BY B.ID ORDER BY scheme2.B.cnt, scheme2.B.another_attr DESC) as rnk
                FROM 
                                scheme1.A
                INNER JOIN
                                scheme2.B
                ON
                                scheme2.B.id = scheme2.A.id
) A WHERE rnk = 1;
dovka
  • 971
  • 1
  • 8
  • 20