0

Possible Duplicate:
Combine multiple results in a subquery into a single comma-separated value
Concat groups in SQL Server

I want to be able to get the duplication's removed

SELECT Count(Data) as Cnt, Id
FROM [db].[dbo].[View_myView]
Group By Data
HAVING Count(Data) > 1

In MySQL it was as simple as this:

SELECT Count(Data), group_concat(Id)
FROM View_myView
Group By Data
Having Cnt > 1

Does anyone know of a solution? Examples are a plus!

Community
  • 1
  • 1
Jason Foglia
  • 2,414
  • 3
  • 27
  • 48
  • 1
    You can't `group by` something you use in an aggregate function. I think you mean to `group by id` – Devin Burke Jan 20 '12 at 22:15
  • I'm trying to remove dups out of the DB. MSSQL does not make this simple, this is over "searched" and used all the time in DB's task anywhere. There should be a way to prevent dups in the first place, but this was not my DB and app. – Jason Foglia Jan 20 '12 at 23:55

2 Answers2

1

In SQL Server as of version 2005 and newer, you can use a CTE (Common Table Expression) with the ROW_NUMBER function to eliminate duplicates:

;WITH LastPerUser AS
(
   SELECT 
       ID, UserID, ClassID, SchoolID, Created,
       ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY Created DESC) AS 'RowNum'
   FROM dbo.YourTable
)
SELECT 
   ID, UserID, ClassID, SchoolID, Created,
FROM LastPerUser
WHERE RowNum = 1

This CTE "partitions" your data by UserID, and for each partition, the ROW_NUMBER function hands out sequential numbers, starting at 1 and ordered by Created DESC - so the latest row gets RowNum = 1 (for each UserID) which is what I select from the CTE in the SELECT statement after it.

Using the same CTE, you can also easily delete duplicates:

;WITH LastPerUser AS
(
   SELECT 
       ID, UserID, ClassID, SchoolID, Created,
       ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY Created DESC) AS 'RowNum'
   FROM dbo.YourTable
)
DELETE FROM dbo.YourTable t
FROM LastPerUser cte
WHERE t.ID = cte.ID AND cte.RowNum > 1

Same principle applies: you "group" (or partition) your data by some criteria, you consecutively number all the rows for each data partition, and those with values larger than 1 for the "partitioned row number" are weeded out by the DELETE.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
0

Just use distinct to remove duplicates. It sounds like you were using group_concat to join duplicates without actually wanting to use its value. In that case, MySQL also has a distinct you could have been using:

SELECT DISTINCT Count(Data) as Cnt, Id
FROM [db].[dbo].[View_myView]
GROUP BY Id
HAVING Count(Data) > 1

Also, you can't group by something you use in an aggregate function; I think you mean to group by id. I corrected it in the example above.

Devin Burke
  • 13,642
  • 12
  • 55
  • 82
  • Thank you @Justin Satyr! My intention is to use the Data column to group by. Obviously, MSSQL does not allow you to do that. I'm moving from MySQL to MSSQL. I guess I was spoiled! – Jason Foglia Jan 20 '12 at 23:58