1

I have a table with 82,535 rows, where 65,087 rows are unique by ID. When I pull the entire result set of 82,535 and copy to Excel and remove duplicates, it shows that there are 17,448 duplicates. But when I'm using the query below I'm getting different results:

SELECT
    BLD_ID, COUNT(BLD_ID) AS [BLD_ID COUNT] 
FROM
    Project.BreakageAnalysisOutcome_SentToAIM 
GROUP BY 
    BLD_ID 
HAVING 
    COUNT(BLD_ID) > = 2

This query returns a value of 17,364

enter image description here

I know for sure that the number of unique BLD_ID is 65,087

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Data Engineer
  • 795
  • 16
  • 41
  • 1
    You could have some that are triplicate or more. This would change the count. Run the query again with "Having count(BLD_ID) > = 3" –  May 14 '16 at 02:47
  • How did you calculate 17,364 from your query? From your query, Number of duplicates should be sum([BLD_ID COUNT]) - (number of row fetch) so you should rewrite it to get the right number. – Pham X. Bach May 14 '16 at 02:47

2 Answers2

2

Most likely reason fro that is duplicate record may have more than 2 occurrence.

find duplicate count

Select  COUNT(BLD_ID)- COUNT( DISTINCT BLD_ID) 
From Project.BreakageAnalysisOutcome_SentToAIM 
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
sandeep rawat
  • 4,797
  • 1
  • 18
  • 36
0

Use CTE with a Row_Number Function instead of count with group by clause and filer by Row_Number > 1.

;WITH cte AS ( SELECT ID, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID) AS Rn FROM [Table1] ) DELETE cte WHERE Rn > 1

Data Engineer
  • 795
  • 16
  • 41
  • How does a DELETE statement answer the question that was asked? The question doesn't mention removing rows with duplicate values.It asks about "Select count of duplicates". – spencer7593 May 14 '16 at 13:28