0

I have the following code

SELECT 
    PhysicalAddressID
FROM 
    tblPhysicalAddresses  as a
WHERE EXISTS 
(
    SELECT 
        PhysicalAddressID
    FROM 
        tblPhysicalAddresses as b
    WHERE
        a.PhysicalAddressID <> b.PhysicalAddressID AND

        a.Address1 = b.Address1 AND
        a.Address2 = b.Address2 AND
        a.Address3 = b.Address3 AND
        a.City = b.City AND
        a.CountryID = b.CountryID AND
        a.PersonID = b.PersonID AND
        a.PhysicalAddressTypeID = b.PhysicalAddressTypeID AND
        a.PostalCode = b.PostalCode AnD
        a.Province = b.Province AND
        a.StateID = b.StateID AND
        a.ZipCode = b.ZipCode

)

I can visually look at the data and see that there are matches. However when I run the query it returns nothing. Could this be due to the comparison of nullable columns? If so what is the alternative. The goal is to find duplicate rows.

For example an instance where are of the columns are null. There is no match found although they are all null.

Basically I have 80 databases with the same schema. I decided to merge them into one. However there are duplicate addresses. Therefore my goal is to merge duplicate addresses.

Logan Murphy
  • 6,120
  • 3
  • 24
  • 42
  • How could we know?, can you post an example of a couple of rows that should match according to you? – Lamak Jul 21 '14 at 18:32
  • I don't think the presence of nullable columns would cause all results to fail. But are you certain that the rows you think should match, do indeed match? – Siyual Jul 21 '14 at 18:34
  • 1
    You'd need to enhance the predicate logic...something like `(a.Address1 = b.Address1 OR (a.Address1 IS NULL AND b.Address1 IS NULL)) AND (a.Address2 = b.Address2 OR (a.Address2 IS NULL AND b.Address2 IS NULL)) AND...` – Dave Mason Jul 21 '14 at 18:35
  • I was talking about this part of your question: *I can visually look at the data and see that there are matches*. We can't see that data, can you post a couple of those rows so we can see why they are not being returned? – Lamak Jul 21 '14 at 18:38

3 Answers3

3

You can use this technique to evaluate two corresponding nulls as equal

WHERE
    a.PhysicalAddressID <> b.PhysicalAddressID
          AND EXISTS (SELECT a.Address1, a.Address2 
                      INTERSECT
                      SELECT b.Address1, b.Address2)

(Remaining columns left for you to fill out)

Community
  • 1
  • 1
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
2

I would suggest doing this as an aggregation:

SELECT a.Address1, a.Address2, a.Address3, a.City, a.CountryID, a.PersonID,
       a.PhysicalAddressTypeID, a.PostalCode, a.Province, a.StateID, a.ZipCode,
       COUNT(*) as NumPhysicalAddresses,
       MIN(a.PhysicalAddressId), MAX(a.PhysicalAddressId)
FROM tblPhysicalAddresses a
GROUP BY a.Address1, a.Address2, a.Address3, a.City, a.CountryID, a.PersonID,
         a.PhysicalAddressTypeID, a.PostalCode, a.Province, a.StateID, a.ZipCode
HAVING COUNT(*) > 1;

You could be facing many problems in not finding matches:

  • One or more of the columns are NULL (which the group by version will catch).
  • Spaces and/or other non-visible characters in the strings.
  • Similar characters such as capital-O and the number 0.
  • The lack of exact duplicates, although many fields may overlap.
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

Not clear what you want but this might be it

SELECT a.PhysicalAddressID 
  FROM tblPhysicalAddresses as b
  JOIN tblPhysicalAddresses as b
    ON a.PhysicalAddressID <> b.PhysicalAddressID 
   AND (a.Address1 = b.Address1 or (a.Address1 is null and b.Address1 is null))
   AND (a.Address2 = b.Address2 or (a.Address2 is null and a.Address2 is null)) 
   AND (a.Address3 = b.Address3 or (a.Address2 is null and b.Address3 is null)) 
   AND  ...
paparazzo
  • 44,497
  • 23
  • 105
  • 176