SQL, join two tables. ID unique in table1, but repeats in table2. Pick which records to include from table2 based record's fields

Question

I'm trying to join all of the fields from two tables based on an account ID. The account ID is unique in table 1, but is repeated in table 2. In table 2, I only want to pull in one record per ID.

Table 1:

ID...Field1....Field2

1......aa.........bb  
2......cc.........dd  
3......ee.........ff  
4......gg.........hh

Table 2:

ID...Field3....Field4......Field5

1.....123........123...........123  
1.....345........546...........453   
2.....123........123...........123  
3.....123........123...........123  
4.....123........123...........722   
4.....123........123...........333  
4.....123........123...........123

I need to select which ID to pull in from table 2 based on their values in three fields. The logic works like this. Among the duplicates, pick the ID record that has the highest value in field3, if they're all the same, pick the one with the highest value in field4, if they're still the same, pick the one with the highest value in field5.

So that the end result looks like this:

ID....Field1.....Field2.....Field3.....5ield4.....Field5   
1........aa..........bb............345.......546.........452   
2........cc..........dd............123.......123.........123   
3........ee..........ff............123.......123.........123   
4........gg..........hh............123.......123.........722

Thank you so much! This one has stumped me for a while now

Your sample data is not a good test case. Taking the max of each column gives the seemingly correct answer. But your problem is more complex than that. Supplementing table2 with rows like `1,400,100,100` and `2,123,200,100` would make for a better test. — dbenham, Aug 11 '12 at 19:48

Tango_Guy---pretend_web_guy · Accepted Answer · 2012-08-11T21:03:06.790

2

jdevelop has a nice idea, but it doesn't account for the fact that field3 needs to take precedence over field4, etc.

I don't believe there is a rownumber in access, which would simplify things a lot. I haven't used access in a long time, so you may need to modify the below, but it should get you almost all the way there. If this is a 1-time effort, then you can probably sort and then add an autonumber field. if not:

SELECT a.id, d.field3, d.field4, max(d.field5) as m_field5
FROM (((Table1 a
INNER JOIN (
    select id, max(field3) as m3
    from table2
    group by id
    ) b on a.id = b.id                        )
INNER JOIN (
    select id, field3, max(field4) as m4
    from table2
    group by id, field3
    ) c
    on b.id = c.id
    and b.m3 = c.field3                       )
INNER JOIN table2 d
    on c.id = d.id
    and c.field3 = d.field3
    and c.m4 = d.field4                       )
group by a.id, d.field3, d.field4

Note: Edited per dbenham's advice
Note2: Edited to add parenthesis for joins as access requires extra parenthesis when doing multiple joins

If a one time effort or you can otherwise get a rownumber, you could probably do something like:

select id, field3, field4, field5, rownumber
into #temp_ordered
from table2
order by id, field3, field4, field5

select a.*
from #temp_ordered a
inner join (
    select id, min(rownumber) min_rownumber
    from #temp_ordered 
    group by id
    ) b on a.id = b.id and a.rownumber = b.min_rownumber

Best, David

edited Aug 11 '12 at 21:03

answered Aug 11 '12 at 18:35

Tango_Guy---pretend_web_guy

42
7

+1, Very nice solution for a database with only primitive SQL support. I fixed 2 typos that were breaking the query: a is table1, not table2, and b.field4 became d.field4. You can shorten the query by joining directly to table2 d without a subquery and put max(d.field5) in main select and group by in main query. – dbenham Aug 11 '12 at 19:07
Additional comment: You probably want to use the distinct keyword, as you only 1 record to come back. Additionally, if you have nulls, then you'll need to make a conscious decision about how you want those to work. – Tango_Guy---pretend_web_guy Aug 11 '12 at 19:57
I'm getting a "Syntax error in FROM clause" message at the first join ( line when I try to run it. – user1592301 Aug 11 '12 at 20:08
I wondered if it was a typo on my part, but i changed the source table and fields to have the same names as my example so it'd be copy/paste, but I'm still getting that error message at the first join line. Were either of you able to test it in access? Thank you both SO much by the way, I think I'm finally getting close – user1592301 Aug 11 '12 at 20:17
Hi user1592301 - I have tested, and reproduced the problem. Apparently access requires extra parenthesis (http://stackoverflow.com/questions/7854969/sql-multiple-join-statement). This should now be good to go. – Tango_Guy---pretend_web_guy Aug 11 '12 at 21:04
Thank you!!! That works great, I'll just need to do some tweaking since I have quite a few fields in the real tables, but finally something that works well in access! – user1592301 Aug 11 '12 at 21:42

John Woo · Answer 2 · 2012-08-13T06:03:54.213

1

I'm assuming that you are using MSSQL Server for this. You need to use CROSS APPLY in order to solve the problem.

SELECT a.*, x.*
FROM   tb1 a 
CROSS APPLY 
(
    SELECT   TOP 1 *
    FROM     tb2 b
    WHERE    a.id = b.id
    ORDER BY f3 desc, f4 desc, f5 desc
) x

SQLFiddle Demo

edited Aug 13 '12 at 06:03

answered Aug 11 '12 at 14:30

John Woo

258,903
69
498
492

This looks like it should do what I'm trying to accomplish, I'm working in Access 2010 however and am receiving a "Syntax error in FROM clause" message at the Cross Apply line when I try to run it. Any ideas? Thanks again! – user1592301 Aug 11 '12 at 17:51

dbenham · Answer 3 · 2012-08-11T19:37:19.280

(This answer was developed before the OP identified Access as the target database)

The following ANSI syntax is efficient and will work in a wide variety of databases, including:

Oracle 11 and later (as far back as 8i if the CTE is moved to an inline view)
SqlServer 2005 and later
PostgreSQL 8.4 and later

.

with t2 as (
  select id,
         field3,
         field4,
         field5,
         row_number()
           over( partition by id
                 order by field3 desc, field4 desc, field5 desc
               ) as pref
    from table2
)
select t1.id,
       t1.field1,
       t1.field2,
       t2.field3,
       t2.field4,
       t2.field5
  from table1 t1
  join t2
    on t1.id=t2.id
   and t2.pref=1

Oracle has an even more efficient method using KEEP LAST that eliminates the need for any subquery or CTE. I believe SqlServer 2012 may have introduced a similar functionality with different syntax, but I am not familiar with it.

select id,
       field1,
       field2,
       max(field3) keep (dense_rank last order by field3, field4, field5)
       max(field4) keep (dense_rank last order by field3, field4, field5)
       max(field5) keep (dense_rank last order by field3, field4, field5)
  from table1
  join table2 using(id)
 group by id

Here is the SQL Fiddle for both Oracle queries.

Thank you for your response! Unfortunately I'm having to do this in Access 2010 and am having trouble getting the first solution to run. What changes to the syntax should I make to get it to run within access? I think it's having an issue with "with" and "over(" — user1592301, Aug 11 '12 at 18:37

score 0 · Answer 4 · answered Aug 11 '12 at 19:10

0

 SELECT T1.ID,F1,F2,MAX(F3) F3,MAX(F4) F4,MAX(F5) F5
 FROM TBL1 T1,TBL2 T2
 WHERE T1.ID=T2.ID
 GROUP BY T1.ID,F1,F2
 ORDER BY T1.ID;
                  (OR)
 SELECT T1.ID,F1,F2,F3,F4,F5
 FROM TBL1 T1,(SELECT ID,MAX(F3) F3,MAX(F4) F4,MAX(F5) F5 FROM TBL2 GROUP BY ID) T2
 WHERE T1.ID=T2.ID
 ORDER BY T1.ID;

I hope both the solutions will work out.

answered Aug 11 '12 at 19:10

Aspirant

2,238
9
31
43

No dice, they both give the same wrong answer. You cannot simply take the max(F3), independent of F4 and F5. – dbenham Aug 11 '12 at 19:43

SQL, join two tables. ID unique in table1, but repeats in table2. Pick which records to include from table2 based record's fields

4 Answers4

SQLFiddle Demo