Below is something I am trying to achieve.
Data from 3 different files are loaded into a single table in which I need to categorize the data based on the names, and check for the best fit of the data across all rows.
The same name can have a maximum of 3 occurrences inside the table and a minimum of 1.
The data comparison should happen on all 3 rows or 2 rows (If the name did not come from one source) if there are more than one occurrences for a given name. If there is only 1 row for a given name that should be taken as the default value.
This is my attempt.
select
case
when (coalesce(length(A.x_manufacturer),0) > coalesce(length(B.x_manufacturer),0) AND coalesce(length(A.x_manufacturer),0) > coalesce(length(C.x_manufacturer),0)) then A.x_manufacturer
when (coalesce(length(B.x_manufacturer),0) > coalesce(length(A.x_manufacturer),0) AND coalesce(length(B.x_manufacturer),0) > coalesce(length(C.x_manufacturer),0)) then B.x_manufacturer
when (coalesce(length(C.x_manufacturer),0) > coalesce(length(A.x_manufacturer),0) AND coalesce(length(C.x_manufacturer),0) > coalesce(length(B.x_manufacturer),0)) then C.x_manufacturer
else C.x_manufacturer end as Best_Fit_x_manufacturer
from tbl1 A left outer join tbl1 B on
A.name = B.name
left outer join tbl1 C on C.name = B.name
where A.sourceid=1 and B.sourceid=2 and C.sourceid=3 group by
C.name
Sample Data in Table:
Name ManuFacturer source
A AB 1
A ABC 2
A ABCD 3
B BC 1
Expected Output
Name ManuFacturer source
A ABCD 3
B BC 1