I have 2 datasets:
dataset 1:
AMT1 AMT2 AMT3 AMT4 ID
245 955 0 955 1
245 0 245 245 1
dataset 2:
AMT1 AMT2 AMT3 AMT4 ID NAME
245 955 0 955 1 AMY
245 0 245 245 1 AMY
JOIN:
proc sql;
create table demo as
select distinct a.id,
sum(a.amT1) as exp1,
sum(a.amT2) as exp2,
sum(a.amT3) as exp3,
sum(a.amT4) as exp4,
MAX(name) as name
from dataset1 a
left join dataset2 b
on a.id = b.id
group by 1;
quit;
OUTput: EXPECTED
ID AMT1 AMT2 AMT3 AMT4 CUSTNAME
1 490 955 245 1200 AMY
Issue: 1)n the left join, even though I am not fetching amount values from dataset2, they get summed up and the output is doubled the expected value. for ex: amt 4 value becomes 2400 instead of 1200. 2)if I use distinct function inside the sum; the scenarios where there are identical values for a whole row gets dropped.
How do I address this issue?