duplicate values in left join - sql

Question

I have 2 datasets:

dataset 1:
AMT1    AMT2    AMT3    AMT4    ID
245    955       0      955     1
245     0       245     245     1

dataset 2:
AMT1    AMT2    AMT3    AMT4    ID NAME
245    955       0      955     1  AMY
245     0       245     245     1  AMY


JOIN:

    proc sql;
    create table demo as
    select distinct a.id,
    sum(a.amT1) as exp1,
   sum(a.amT2) as exp2,
   sum(a.amT3) as exp3,
   sum(a.amT4) as exp4,
   MAX(name) as name
    from dataset1 a
    left join dataset2 b
    on a.id = b.id
    group by 1;
    quit;

OUTput: EXPECTED

ID  AMT1    AMT2    AMT3    AMT4    CUSTNAME
1   490     955     245    1200    AMY

Issue: 1)n the left join, even though I am not fetching amount values from dataset2, they get summed up and the output is doubled the expected value. for ex: amt 4 value becomes 2400 instead of 1200. 2)if I use distinct function inside the sum; the scenarios where there are identical values for a whole row gets dropped.

How do I address this issue?

Please in code questions give a [mre]--cut & paste & runnable code; example input with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For errors that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) For SQL include DDL & tabular initialization code. When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. [ask] [Help] — philipxy, Oct 21 '21 at 07:07
This seems likely to be a common error where people want some joins, each possibly involving a different key, of some subqueries, each possibly involving join and/or aggregation, but they erroneously try to do all the joining then all the aggregating or to aggregate over previous aggregations. Write separate aggregate subqueries over appropriate rows and/or aggregate (maybe distinct) case statements picking rows of joins; then join the subqueries together. [Two SQL LEFT JOINS produce incorrect result](https://stackoverflow.com/q/12464037/3404097) — philipxy, Oct 21 '21 at 07:10
Please before considering posting read the manual & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. Reflect your research. See [ask], [Help] & the voting arrow mouseover texts. If you post a question, use one phrasing as title. — philipxy, Oct 21 '21 at 07:10
Poor sample data. You should at least have one more ID value, which only exists in one of the tables. — jarlh, Oct 21 '21 at 07:19

score 0 · Answer 1 · answered Oct 21 '21 at 06:28

You need to get unique id and name to avoid duplication of the results
For condition d1.id=d2.id you have several matches thus you have duplicates
In the solution below i have used subselect to get unique values cust_name as ( select id, name from dataset2 group by id, name )
You can store sample data in sub selects using with clause https://www.geeksforgeeks.org/sql-with-clause/

with 
  dataset1 as (
    select 245 as amt1, 955 as  amt2,  0   amt3 ,  955 amt4 ,  1 id  union all
    select 245 as amt1, 0 as    amt2,  245 amt3 ,  245 amt4 ,  1 id 
)
, dataset2 as (
    select 245 as amt1, 955 as  amt2,  0   amt3 ,  955 amt4 ,  1 id , 'AMY' name  union all
    select 245 as amt1, 0 as    amt2,  245 amt3 ,  245 amt4 ,  1 id , 'AMY' name
)
, cust_name as (
    select id, name from dataset2 group by id, name
)
select 
     c.id
    ,sum(d1.amt1) amt1
    ,sum(d1.amt2) amt2
    ,sum(d1.amt3) amt3
    ,sum(d1.amt4) amt4
    ,c.name
from 
    dataset1 d1
left join
    cust_name c
on d1.id=c.id

here you can test this sql : https://www.jdoodle.com/execute-sql-online/

Here you don't need join you can get all info from 1 table

with 
 dataset2 as (
    select 245 as amt1, 955 as  amt2,  0   amt3 ,  955 amt4 ,  1 id , 'AMY' name  union all
    select 245 as amt1, 0 as    amt2,  245 amt3 ,  245 amt4 ,  1 id , 'AMY' name
)
select 
     d2.id
    ,sum(d2.amt1) amt1
    ,sum(d2.amt2) amt2
    ,sum(d2.amt3) amt3
    ,sum(d2.amt4) amt4
    ,d2.name
from 
    dataset2 d2

duplicate values in left join - sql

1 Answers1