0

I have two huge database tables names "AR" and "All", and I am trying to match records in "AR" to "All", note here we don't have a unique identifier, so I am doing a kind of fuzzy matching using First Name, last name, dob and ssn to get the matches. My match query is working.

The All table has a column "MID" which I want to fetch for my every matched record, but when I try my query I get thousands of records. I searched a lot online but could not figure it out.

I am trying to get the first matched record from "All" table along with corresponding MId, for each and every record in my "AR" table. Can anyone help me out here. My Query is below:

Select distinct a.*, 
       r."MID"
  from "public"."AR" a
       inner join "public"."All" r 
          On (    a."cDOB" = r."cDOB" 
              and right(a."SSN",4) = right(r."SSN",4) 
              and left(a."Last Name",4) = left(r."LastName",4) 
              and (a."SSN"!='' or r."SSN"!='')  
             )
             OR 
            (     left(a."First Name",4) = left(r."FirstName",4)
              and (   left(a."Last Name",4) = left(r."LastName",4) 
                   OR right(a."Last Name",4) = right(r."LastName",4)
                  )
              and (   right(a."SSN",4) = r."SSN" 
                   OR a."cDOB" = r."cDOB"
                  ) 
              and (   a."SSN"!='' 
                   OR r."SSN"!=''
                  )
            )
            OR 
            (    a."MelID (Original) " = r."Prp" 
             and a."cDOB" = r."cDOB" 
             and r."Prp"!=''
            );

The query gives me the correct output if I remove r."MID", from the first line, but when I fetch r."MID" the output records are a lot with duplicates and what not.

  • You forgot to provide your table definitions and Postgres version. Some sample data and the desired result would go a long way, too. Also, your description does not add up. `All table has a column "MID" which I want to fetch for my every matched record` contradicts: `trying to get the first matched record from "All" table along with corresponding MId`. So *every* matched row or just the first? Please clarify. And define "first". – Erwin Brandstetter Feb 26 '17 at 06:18

2 Answers2

0

I think the problem is that you're doing an inner join with 3 OR conditions, so you get duplicates when a record matches on more than one of them. Try the below where you left join to the "MID" table 3 times and only keep results where at least one matched.

Select distinct a.*, 
       nvl(nvl(r."MID",r2."MID"),r3."MID") as MID
  from "public"."AR" a
       left join "public"."All" r 
          On (    a."cDOB" = r."cDOB" 
              and right(a."SSN",4) = right(r."SSN",4) 
              and left(a."Last Name",4) = left(r."LastName",4) 
              and (a."SSN"!='' or r."SSN"!='')  
             )
        left join "public"."All" r2 
           On (     left(a."First Name",4) = left(r2."FirstName",4)
              and (   left(a."Last Name",4) = left(r2."LastName",4) 
                   OR right(a."Last Name",4) = right(r2."LastName",4)
                  )
              and (   right(a."SSN",4) = r2."SSN" 
                   OR a."cDOB" = r2."cDOB"
                  ) 
              and (   a."SSN"!='' 
                   OR r2."SSN"!=''
                  )
            )
        left join "public"."All" r3 
            (    a."MelID (Original) " = r3."Prp" 
             and a."cDOB" = r3."cDOB" 
             and r3."Prp"!=''
            )
    WHERE (r."MID" IS NOT NULL OR r2."MID" IS NOT NULL OR r3."MID" IS NOT NULL)
;
Jeremy Real
  • 726
  • 1
  • 6
  • 11
0

To fetch the "first" MID from All for every row in AR, you can use DISTINCT ON:

SELECT DISTINCT ON (a.undisclosed_pk_column)
       a.*, r."MID"
FROM ...
...
ORDER  BY a.undisclosed_pk_column, r.undisclosed_columns_defining_first;

Related:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228