0

i have [RecToProcessId, intContactId, vchrCompany, Oldcompany Title, vchrLastName, vchrFirstName, Designation, vchrAddress1, vchrAddress2 vchrPhoneNo, vchrBusinessNo, vchrMobileNo, vchrFaxNo, Email_Old, Email_New, ,Country, vchrBusinessCity, vchrZipCode, STD_Code, ISD_Code, dtUpdateDate, CITY_FLAG, RecordStatus, CompanyUrl,ContactUrl] Columns in one table with more than 50000 records in it i already tried finding duplicates records using Email_New column but there were some junk email too

{SELECT *
into INDIVIDUAL_DEDUPE_DATA
FROM (SELECT *, COUNT(*) OVER (PARTITION BY vchrEmail) AS dup_key
  FROM FINAL_RECEIVED_COMBINED_INDIVIDUAL_UNCLEANED_NON_COMPANY
 ) T
WHERE vchrEmail is not null
and len(vchrEmail)<>0
and dup_key>1
and vchrEmail<>'0'
and vchrEmail<>'-'
and vchrEmail not like '%abc%'
and vchrEmail not like '%xyz%'
and vchrEmail not like '%email%'
and vchrEmail not like '%info%'
and vchrEmail not like '%no@email%'
and vchrEmail not like '%no@gmail%'
and vchrEmail not like '%test%'
and vchrEmail not like '%test@test%'
and vchrEmail not like '%xxx%'
and vchrEmail not like '%xxx@xxx%'
and vchrEmail not like '%xy@%'
ORDER BY vchrEmail}

how can find duplicates on vchrMobileNo and vchrFirstName+vchrLastName as name column or any other ways ?

Richard
  • 106,783
  • 21
  • 203
  • 265
  • Based on your question I think you're looking for different approaches to find duplicates, not the syntax itself. If so my approach would be to better understand the source in how this data was generated. once you know that then you can see how duplicates may occur which would give you alternate approaches to sanitize the data. I don't thin you're asking for help on SQL more method/approach is that right? – xQbert Feb 11 '16 at 13:28
  • http://stackoverflow.com/a/2112639/2912399 – Andrew Paes Feb 11 '16 at 13:29
  • 1
    Possible duplicate of [Finding duplicate rows in SQL Server](http://stackoverflow.com/questions/2112618/finding-duplicate-rows-in-sql-server) – Pavel V. Feb 11 '16 at 13:54
  • Thank you all for answers and sugessions @xQbert-I want help on sql coding only i have tried different ways but i have around 60K records with 42 columns Can any one please help me with finding duplicates on column "vchrMobileNo and vchrFirstName+vchrLastName as one column – Sushant Saldur Feb 12 '16 at 08:15

1 Answers1

2

The answer is to group by...

select count(1), col1, col2
from theTable
group by col1,col2
having count(1) > 1

Will return rows where there is more than one row with the same value for the two specified columns. Once you have this information you can use it as a sub-query to do something with those rows. Eg. to see all the columns in them:

select outer.*
from theTable outer
  inner join (select col1, col2
              from theTable
              group by col1,col2
              having count(1) > 1) as inner
     on outer.col1 = inner.col1 and outer.col2 = inner.col2
Richard
  • 106,783
  • 21
  • 203
  • 265
  • Thank u for ur help @Richard Can u please help me if i have to find duplicates on "vchrMobileNo" column which is having 10 or more digits and is there any way to find duplicates on last 5-6 matching digits ?? – Sushant Saldur Feb 12 '16 at 09:19
  • @SushantSaldur You'll need to use the applicable SQL Server string functions to extract and compare just part of the string in the `group by` and/or `join` clauses. – Richard Feb 12 '16 at 09:48