61

I'm trying to join two tables using a left-join. And the result set has to include only the first record from the "right" joined table.

Lets say I have two tables A and B as below;

Table "A"

code | emp_no

101  | 12222
102  | 23333
103  | 34444
104  | 45555
105  | 56666

Table "B"

code | city       | county
101  | Glen Oaks  | Queens
101  | Astoria    | Queens
101  | Flushing   | Queens
102  | Ridgewood  | Brooklyn
103  | Bayside    | New York

Expected Output:

code | emp_no | city      | county
101  | 12222  | Glen Oaks | Queens
102  | 23333  | Ridgewood | Brooklyn
103  | 34444  | Bayside   | New York
104  | 45555  | NULL      | NULL
105  | 56666  | NULL      | NULL

If you notice my result has only the one matched record from table "B"(doesn't matter what record is matched) after left join (and it is a one to many mapping)

I need to pick the first matched record from table B and ignore all other rows.

Please help!

Thanks

RayLoveless
  • 19,880
  • 21
  • 76
  • 94
Sandra
  • 691
  • 2
  • 6
  • 4
  • 3
    Rows have no implied order in SQL. How do you decide which one is the "first matched record"? – NullUserException Nov 17 '11 at 02:42
  • if we order by city on table B astoria becomes the 1st record, How do we know which is "first" – xQbert Nov 17 '11 at 02:42
  • 4
    For code 101, the matched row could be any of the 3 rows. It doesn't matter which row to be selected. So it could be either Glan Oaks or Astoria or Flushing in the result set. – Sandra Nov 17 '11 at 02:44

7 Answers7

43

After playing around a bit, this turns out to be trickier than I'd expected! Assuming that table_b has some single column that is unique (say, a single-field primary key), it looks like you can do this:

SELECT table_a.code,
       table_a.emp_no,
       table_b.city,
       table_b.county
  FROM table_a
  LEFT
  JOIN table_b
    ON table_b.code = table_a.code
   AND table_b.field_that_is_unique =
        ( SELECT TOP 1
                 field_that_is_unique
            FROM table_b
           WHERE table_b.code = table_a.code
       )
;
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • Ah I see. I suppose I can get rid of the `is null` test from my query too. – Raihan Nov 17 '11 at 04:05
  • This answer worked well for me, and is far more elegant than some of the other subquery solutions presented across the Internet regarding this problem. Thank you very much! – Jody Bruchon Apr 20 '15 at 16:17
13

Another option: OUTER APPLY

If supported by the database, OUTER APPLY is an efficient and terse option.

SELECT *
FROM 
    Table_A a
OUTER APPLY
    (SELECT TOP 1 * 
    FROM Table_B b_1
    WHERE b_1.code = a.code
    ) b
;

This results in a left join to the indeterminate first matched record. My tests show it to be quicker than any other posted solution (on MS SQL Server 2012).

QuintinDB
  • 133
  • 1
  • 5
  • Important to note that unlike the accepted answer, this will work with SQLCE. The accepted answer uses scalar queries which are unsupported. – statler Jan 16 '17 at 10:08
10

The highest voted answer does not seem correct to me, and seems overcomplicated. Just group by the code field on table B in your subquery and select the maximum Id per grouping.

SELECT 
    table_a.code,
    table_a.emp_no,
    table_b.city,
    table_b.county
FROM 
    table_a
    LEFT JOIN 
        table_b
        ON table_b.code = table_a.code
        AND table_b.field_that_is_unique IN
            (SELECT MAX(field_that_is_unique)
             FROM table_b
             GROUP BY table_b.code)
CShark
  • 2,183
  • 1
  • 24
  • 42
  • 1
    While this seems to be a working solution, it implies that you want to select a row with a maximum value, whereas the OP just wants to have ANY one row. Thus, the answer by @ruakh seems to be more precise (and probably a little faster as there is no aggregation function in use) – Marcel May 04 '16 at 08:47
  • 2
    Thanks for the feedback, Marcel. The maximum value is as good as any other when it does not matter to the OP which row is selected (not to be confused with "random") - I fail to see how this makes the accepted answer more precise. In terms of speed the accepted answer might be faster, but it comes at the cost of complexity. Unless you are going to run this query exhaustively, the difference in performance is negligible and the simplicity of my answer preferable. – CShark May 05 '16 at 16:02
  • You can also use and = instead of a IN for the subquery – Isan Rodriguez Trimiño Apr 24 '20 at 16:15
4

I modified the answer from ruakh and this seem to work perfectly with mysql.

SELECT 
   table_a.code,
   table_a.emp_no,
   table_b.city,
   table_b.county
FROM table_a a
LEFT JOIN table_b b
ON b.code = a.code 
AND b.id = (  SELECT id FROM table_b 
              WHERE table_b.code = table_a.code 
              LIMIT 1
           )
;
Jason Washo
  • 536
  • 6
  • 22
Gayan
  • 41
  • 1
4

If you are on SQL Server 2005 or later version, you could use ranking to achieve what you want. In particular, ROW_NUMBER() seems to suit your needs nicely:

WITH B_ranked AS (
  SELECT
    *,
    rnk = ROW_NUMBER() OVER (PARTITION BY code ORDER BY city)
  FROM B
)
SELECT
  A.code,
  A.emp_no,
  B.city,
  B.county
FROM A
  LEFT JOIN B_ranked AS B ON A.code = B.code AND b.rnk = 1

OR

WITH B_unique_code AS (
  select * from(
     SELECT
      *,
      rnk = ROW_NUMBER() OVER (PARTITION BY code ORDER BY city)
      FROM B
     ) AS s
  where rnk = 1
)
SELECT
  A.code,
  A.emp_no,
  B.city,
  B.county
FROM A
  LEFT JOIN B_unique_code AS B ON A.code = B.code
Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • @RayL: Thanks for the alternative version. I didn't realise the `AND b.rnk = 1` bit could be unclear. (Thought I'd formatted it the way that wouldn't cause any confusion, but oh well.) – Andriy M Jan 10 '14 at 22:05
  • For *some* the alternative may be easier to read. I like my alternative but it may not be easier to read for most. – RayLoveless Jan 13 '14 at 21:08
1

In Oracle you can do:

WITH first_b AS (SELECT code, min(rowid) AS rid FROM b GROUP BY code)) 
SELECT a.code, a.emp_no, b.city, b.county
FROM a 
INNER JOIN first_b 
 ON first_b.code = a.code
INNER JOIN b
 ON b.rowid = first_b.rid
kevin cline
  • 2,608
  • 2
  • 25
  • 38
1

this is how:

 Select * From TableA a
     Left Join TableB b
         On b.Code = a.Code 
             And [Here put criteria predicate that 'defines' what the first record is]

Hey, if the city and county are unique, then use them

   Select * From TableA a
     Left Join TableB b
         On b.Code = a.Code 
             And b.City + b.county =
                  (Select Min(city + county)
                   From TableB 
                   Where Code = b.Code)

But the point is you have to put some expression in there to tell the query processor what it means to be first.

Charles Bretana
  • 143,358
  • 22
  • 150
  • 216
  • For code 101, the matched row could be any of the 3 rows. It doesn't matter which row to be selected. So it could be either Glan Oaks or Astoria or Flushing in the result set – Sandra Nov 17 '11 at 02:47
  • @Sandra: Which dialect of SQL? Different ones have different ways of selecting just one row (`WHERE ROWNUM = 1` for Oracle, `LIMIT 1` for MySQL and PostgreSQL, `TOP 1` for SQLServer, and so on). – ruakh Nov 17 '11 at 02:54