Approach
The following approach can be used to de-duplicate a delimited list of values.
- Use the
REPLACE()
function to convert different delimiters into the same delimiter.
- Use the
REPLACE()
function to inject XML closing and opening tags to create an XML fragment
- Use the
CAST(expr AS XML)
function to convert the above fragment into the XML data type
- Use
OUTER APPLY
to apply the table-valued function nodes()
to split the XML fragment into its constituent XML tags. This returns each XML tag on a separate row.
- Extract just the value from the XML tag using the
value()
function, and returns the value using the specified data type.
- Append a comma after the above-mentioned value.
- Note that these values are returned on separate rows. The usage of the
DISTINCT
keyword now removes duplicate rows (i.e. values).
- Use the
FOR XML PATH('')
clause to concatenate the values across multiple rows into a single row.
Query
Putting the above approach in query form:
SELECT DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)') + ','
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
-- Running the query without the following line will return the data in separate rows
-- Running the query with the following line returns the rows concatenated, i.e. it returns:
-- test1,test2,test3,test4,
FOR XML PATH('')
Input & Result
Given the input:
test1,test2,test1|test2,test3|test4,test4|test4
The above query will return the result:
test1,test2,test3,test4,
Notice the trailing comma at the end. I'll leave it as an exercise to you to remove that.
EDIT: Count of Duplicates
OP requested in a comment "how do i get t5he count of duplicates as well? in a seperate column".
The simplest way would be to use the above query but remove the last line FOR XML PATH('')
. Then, counting all values and distinct values returned by the SELECT
expression in the above query (i.e. PivotedTable.PivotedColumn.value('.','nvarchar(max)')
). The difference between the count of all values and the count of distinct values is the count of duplicate values.
SELECT
COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfAllValues
, COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfUniqueValues
-- The difference of the previous two counts is the number of duplicate values
, COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)'))
- COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfDuplicateValues
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
For the same input shown above, the output of this query is:
CountOfAllValues CountOfUniqueValues CountOfDuplicateValues
---------------- ------------------- ----------------------
8 4 4