0

Below is something I am trying to achieve.

Data from 3 different files are loaded into a single table in which I need to categorize the data based on the names, and check for the best fit of the data across all rows.

The same name can have a maximum of 3 occurrences inside the table and a minimum of 1.

The data comparison should happen on all 3 rows or 2 rows (If the name did not come from one source) if there are more than one occurrences for a given name. If there is only 1 row for a given name that should be taken as the default value.

This is my attempt.

select 
case
when (coalesce(length(A.x_manufacturer),0) > coalesce(length(B.x_manufacturer),0) AND coalesce(length(A.x_manufacturer),0) > coalesce(length(C.x_manufacturer),0))  then A.x_manufacturer
when (coalesce(length(B.x_manufacturer),0) > coalesce(length(A.x_manufacturer),0) AND coalesce(length(B.x_manufacturer),0) > coalesce(length(C.x_manufacturer),0))  then B.x_manufacturer
when (coalesce(length(C.x_manufacturer),0) > coalesce(length(A.x_manufacturer),0) AND coalesce(length(C.x_manufacturer),0) > coalesce(length(B.x_manufacturer),0)) then C.x_manufacturer
 else C.x_manufacturer end as Best_Fit_x_manufacturer
from tbl1 A left outer join tbl1 B on 
A.name = B.name 
left outer join tbl1 C on C.name = B.name 
where A.sourceid=1 and B.sourceid=2 and C.sourceid=3  group by 
  C.name

Sample Data in Table:

Name  ManuFacturer source
A     AB           1
A     ABC          2
A     ABCD         3
B     BC           1

Expected Output

Name      ManuFacturer source
A         ABCD         3
B         BC           1
KevinO
  • 4,303
  • 4
  • 27
  • 36
Karivadha
  • 11
  • 1
  • 1
    Please provide a relevant and minimal sample data showcasing your requirements, and expected output. Refer this link on how to frame a good SQL question: [Why should I provide an MCVE for what seems to me to be a very simple SQL query?](https://meta.stackoverflow.com/q/333952/2469308) – Madhur Bhaiya Nov 15 '18 at 08:58
  • Hi. This is a faq. Please always google many clear, concise & specific versions/phrasings of your question/problem/goal with & without your particular strings/names & read many answers. Add relevant keywords you discover to your searches. If you don't find an answer then post, using 1 variant search as title & keywords for tags. See the downvote arrow mouseover text. When you do have a non-duplicate code question to post please read & act on [mcve]. – philipxy Nov 15 '18 at 10:19
  • Learn what LEFT/RIGHT JOIN returns: INNER JOIN rows plus unmatched left/right table rows extended by NULLs. FULL JOIN returns INNER JOIN rows UNION ALL unmatched left & right table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN. A WHERE or INNER ON that requires a possibly NULL-extended column to be not NULL after an OUTER JOIN ON removes any rows extended by NULLs, ie leaves only INNER JOIN rows, ie "turns an OUTER JOIN into an INNER JOIN". You have that. – philipxy Nov 15 '18 at 10:20
  • Possible duplicate of [Left Outer Join Not Working?](https://stackoverflow.com/questions/12467744/left-outer-join-not-working) – philipxy Nov 15 '18 at 10:21
  • Possible duplicate of [How to do a FULL OUTER JOIN in MySQL?](https://stackoverflow.com/q/4796872/3404097) – philipxy Nov 15 '18 at 10:27

1 Answers1

2

As soon as you put a solid condition on an outer joined table, into the WHERE clause, the join reverts to INNER join behavior:

from 
  tbl1 A 
  left outer join tbl1 B on A.name = B.name 
  left outer join tbl1 C on C.name = B.name 
where 
  A.sourceid=1 and 
  B.sourceid=2 and --wrong; this will cause A outer join B to become an INNER join
  C.sourceid=3     --wrong; this will cause B outer join C to become an INNER join

Put your clauses in the ON instead:

from 
  tbl1 A 
  left outer join tbl1 B on A.name = B.name AND B.sourceid=2
  left outer join tbl1 C on C.name = B.name AND C.sourceid=3
where 
  A.sourceid=1

The reason why is:

Outer joins generate NULLs in every column if there's no match between rows, hence B.sourceid might well be null in some rows of the result set. Specifying WHERE B.sourceid=2 causes all the rows where sourceid is null, to disappear, because 2 is not equal to null (nothing is equal to null). This means the only rows that you can possibly get out of it is rows where there IS a match.. Which is an inner join

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • @Karivadha Please see: [How to accept an answer for closure](https://meta.stackexchange.com/a/5235). Thanks :) – Madhur Bhaiya Nov 15 '18 at 11:48
  • One more doubt Caius, What if there are no rows for A.source_id=1 and we have data for B and C? – Karivadha Nov 15 '18 at 11:48
  • Then you'll get no rows. By saying A left join B youre declaring A as your solid source of data onto which you want to attach other possibly matching data. If A might have no matches then you need to FULL OUTER JOIN everything, and don't forget to move the `WHERE a.sourceid = 1` out of the where clause and into the ON – Caius Jard Nov 15 '18 at 14:56