Select count of duplicated rows returns wrong result

Question

I have a table with 82,535 rows, where 65,087 rows are unique by ID. When I pull the entire result set of 82,535 and copy to Excel and remove duplicates, it shows that there are 17,448 duplicates. But when I'm using the query below I'm getting different results:

SELECT
    BLD_ID, COUNT(BLD_ID) AS [BLD_ID COUNT] 
FROM
    Project.BreakageAnalysisOutcome_SentToAIM 
GROUP BY 
    BLD_ID 
HAVING 
    COUNT(BLD_ID) > = 2

This query returns a value of 17,364

I know for sure that the number of unique BLD_ID is 65,087

You could have some that are triplicate or more. This would change the count. Run the query again with "Having count(BLD_ID) > = 3" — , May 14 '16 at 02:47
How did you calculate 17,364 from your query? From your query, Number of duplicates should be sum([BLD_ID COUNT]) - (number of row fetch) so you should rewrite it to get the right number. — Pham X. Bach, May 14 '16 at 02:47

score 2 · Answer 1 · edited May 14 '16 at 07:25

2

Most likely reason fro that is duplicate record may have more than 2 occurrence.

find duplicate count

Select  COUNT(BLD_ID)- COUNT( DISTINCT BLD_ID) 
From Project.BreakageAnalysisOutcome_SentToAIM

edited May 14 '16 at 07:25

marc_s

732,580
175
1,330
1,459

answered May 14 '16 at 04:21

sandeep rawat

4,797
1
18
36

score 0 · Answer 2 · answered May 14 '16 at 12:27

0

Use CTE with a Row_Number Function instead of count with group by clause and filer by Row_Number > 1.

;WITH cte AS ( SELECT ID, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID) AS Rn FROM [Table1] ) DELETE cte WHERE Rn > 1

answered May 14 '16 at 12:27

Data Engineer

795
16
41

How does a DELETE statement answer the question that was asked? The question doesn't mention removing rows with duplicate values.It asks about "Select count of duplicates". – spencer7593 May 14 '16 at 13:28

Select count of duplicated rows returns wrong result

2 Answers2