2

I've got two tables let's call them A and B. Their column names are identical. But, B is fully populated where as A is like a half-eaten apple. They have tons of fields but I'm only going to use one to keep it simple. They both have Company Code.

B has duplicates (because people don't like to delete data?). So, unfortunately I can't easily join the two together on company code because I get these repeats.

What I want to do is take only one distinct row from B for each company code. I only want the row with the highest index (it was updated last) for each company code.

Then, I want to join those rows with corresponding company codes in A. But, a little snag is that A has some company codes which are not in B that I need to have in my result set. B has most of the data, so I'm taking data from B based on the matching above. Of course, for those entries where the company code only exists in A the data will have to remain blank for certain fields.

So.. how would I do this? I was half-tempted to just do the logic outside of the SQL call. But, the function I call takes a resultset as a parameter and then generates a file from it. So, I would either have to rewrite the function, or create my own resultSet object putting in the data I want (could get pretty ugly).

Any ideas? I'm using MySQL, but I feel like if someone can give me a highish level solution I should be able to do translate that into MySQL.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Plant More Trees
  • 65
  • 1
  • 1
  • 9

3 Answers3

0

Since MYSQL doesn't support full outer joins, here's what we could do.

Assuming you have an "Index" field that you are using to get a distinct company code.

SELECT B.Company_Code, B.Index, B.OtherField 
  FROM B LEFT JOIN A ON (B.Company_Code = A.Company_Code)
   WHERE EXISTS (SELECT 1 FROM 
                    (SELECT Company_Code CC, 
                            max(Index) as U 
                     FROM B GROUP BY Company_Code) subq
                 WHERE subq.CC = B.Company_Code and subq.U = B.UpdateDate)
UNION
SELECT A.Company_Code, A.Index, A.OtherField 
  FROM A 
 WHERE NOT EXISTS (SELECT 1 FROM B WHERE B.Company_Code = A.Company_Code)

Essentially, get all records from B with the highest index that may also exist on A. Then add to that set of data all records from A where there's no corresponding company code on B.

N West
  • 6,768
  • 25
  • 40
  • It would be neat trick getting that to work considering MySQL doesn't support `FULL OUTER JOIN` – Conrad Frix Feb 01 '12 at 18:19
  • 1
    [Yeah](http://stackoverflow.com/questions/2384298/mysql-full-outer-join-syntax-error) mySQL is weird. – Conrad Frix Feb 01 '12 at 18:22
  • distinct assumes that the duplicates will have the exact same record. i understood it as it has duplicate company_code but rest of the columns may have different records. thus, only take the one with the highest index value. i can be wrong... – sam yi Feb 01 '12 at 18:26
  • sam yi is correct. The other fields are different. Let me add some other columns to give you an idea. There's company code, company name, and contact name. Often, what happens in table B is that contact name was changed and so a new entry was created (I assume). So, the company code and company name are the same but contact name is different and so I need to select one distinct row for each company code (the one that was updated last and thus contains the highest index). – Plant More Trees Feb 01 '12 at 18:31
0
SELECT b.*
FROM   b
JOIN  (
    SELECT c_code, max(index) AS index
    FROM   b
    GROUP  BY c_code
    ) b_max USING (c_code, index)
RIGHT  JOIN a USING (c_code)

Note how I use RIGHT JOIN here because:

A has some company codes which are not in B that I need to have in my result set

You can optimize your output with COALESCE(b.col1, a.col1) AS col1 to fill in with data from a where the column in b IS NULL. But careful with that. Maybe some columns are supposed to be NULL! Then you get incorrect results.

To be more precise, use a CASE statement:

CASE WHEN b.c_code IS NULL THEN a.col1 ELSE b.col1 END AS col1

So you only fall back to values from a where b does not exist altogether.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0
select [column list]
from A
left join (
  select [column list]
  from B
  join (company_code, max(index) maxindex
    from B
    group by company_code) maxcode
    on b.company_code = maxcode.company_code
    and b.index = maxcode.maxindex
) B_distinct
  on A.company_code = B_distinct.company_code
sam yi
  • 4,806
  • 1
  • 29
  • 40