2

I have a table with 4 columns and data looks like

`cust_id firstname lastname  value`

`1       a            b      ct;ct;ct;dir`
`2       c            a      ct;ct;ct;ct;ct;ct`
`3       d            e      ct;ct;ct;dir;st`

i want output like

`cust_id firstname lastname value`

`1       a            b      ct;dir`
`2       c            a      ct`
`3       d            e      ct;dir;st`

Have lots of rows with different number of repetitive words in each.

Any help much appreciated.

Geeme
  • 395
  • 2
  • 6
  • 18
  • What is the source of data of your table? – Gurwinder Singh Jan 04 '17 at 13:18
  • 3
    The best help I can offer is to suggest normalizing your data. create a new table that will hold the values currently stored as delimited text. Than it will be very easy to eliminate duplicate entries. Also, Read [Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad), where you will see a lot of reasons why the answer to this question is **Absolutly yes!** – Zohar Peled Jan 04 '17 at 13:20

4 Answers4

3

With the help of a Parse/Split Function and a CROSS apply. I should add that the logic of the PARSE can easily be migrated into the CROSS APPLY if you can't use a UDF

Also, the sequence will be preserved if that matters.

Declare @YourTable table (cust_id int,value varchar(50))
Insert Into @YourTable values
(1,'ct;ct;ct;dir'),
(2,'ct;ct;ct;ct;ct;ct'),
(3,'ct;ct;ct;dir;st')

Select A.*
      ,B.*
 From  @YourTable A
 Cross Apply (
    Select String=Stuff((Select Distinct ';' +RetVal From (
        Select RetVal,RetSeq=min(RetSeq)
          From [dbo].[udf-Str-Parse](A.Value,';') C
          Group By RetVal
    ) X For XML Path ('')),1,1,'') 
  ) B

Returns

cust_id value                String
1       ct;ct;ct;dir         ct;dir
2       ct;ct;ct;ct;ct;ct    ct
3       ct;ct;ct;dir;st      ct;dir;st

The UDF if Needed

CREATE FUNCTION [dbo].[udf-Str-Parse] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    Select RetSeq = Row_Number() over (Order By (Select null))
          ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
    From  (Select x = Cast('<x>'+ replace((Select @String as [*] For XML Path('')),@Delimiter,'</x><x>')+'</x>' as xml).query('.')) as A 
    Cross Apply x.nodes('x') AS B(i)
);
--Select * from [dbo].[udf-Str-Parse]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
--Select * from [dbo].[udf-Str-Parse]('this,is,<test>,for,< & >',',')

Another Parse/Spit function (Returns same results as XML version)

CREATE FUNCTION [dbo].[udf-Str-Parse-8K] (@String varchar(max),@Delimiter varchar(10))
Returns Table 
As
Return (  
    with   cte1(N)   As (Select 1 From (Values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) N(N)),
           cte2(N)   As (Select Top (IsNull(DataLength(@String),0)) Row_Number() over (Order By (Select NULL)) From (Select N=1 From cte1 a,cte1 b,cte1 c,cte1 d) A ),
           cte3(N)   As (Select 1 Union All Select t.N+DataLength(@Delimiter) From cte2 t Where Substring(@String,t.N,DataLength(@Delimiter)) = @Delimiter),
           cte4(N,L) As (Select S.N,IsNull(NullIf(CharIndex(@Delimiter,@String,s.N),0)-S.N,8000) From cte3 S)

    Select RetSeq = Row_Number() over (Order By A.N)
          ,RetVal = Substring(@String, A.N, A.L) 
    From   cte4 A
);
--Orginal Source http://www.sqlservercentral.com/articles/Tally+Table/72993/
--Much faster than str-Parse, but limited to 8K
--Select * from [dbo].[udf-Str-Parse-8K]('Dog,Cat,House,Car',',')
--Select * from [dbo].[udf-Str-Parse-8K]('John||Cappelletti||was||here','||')
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
  • It is worth noting that you should only use XML to split if you are certain the string will never contain special xml characters (`<`, `>`, `&`), it performs well but is not a good candidate for a generic splitter as a function. The following will fail for example - `Select * from [dbo].[udf-Str-Parse]('ct&;ct;ct;dir;',';')` – GarethD Jan 04 '17 at 14:00
  • Not for all delimiters - run the example I posted. Or this one: `Select * from [dbo].[udf-Str-Parse]('ct&;..',';.')`, or this: `Select * from [dbo].[udf-Str-Parse]('1>2>3','>')` – GarethD Jan 04 '17 at 14:05
  • @GarethD You are correct. the semi-colon can cause a break in the xml normally I see commas – John Cappelletti Jan 04 '17 at 14:05
  • @GarethD Added the Tally-Table Parse .. made only a couple of tweaks to the original – John Cappelletti Jan 04 '17 at 14:11
0

If you want repeated, adjacent instances of ct; to be turned into a single one, you can do:

select replace(replace(replace(col, 'ct;', '><'), '<>', ''), '><', 'ct;')

This assumes that "<" and ">" do not appear in the column. Any two characters can be used for this purpose.

You can also readily put this into an update.

Note: If these are codes of some sort, then you should normalize the data. This problem does appear in other contexts where normalization is not appropriate (for instance, removing consecutive spaces in a string).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

The first thing to point out is that if you stored your data in a normalised fashion then you would not have as much of an issue, the best way would be a separate one to many table, e.g.

CustomerValues

Cust_ID     Value
-------------------
1           ct
1           ct
1           ct
1           dir
2           ct
2           ct
.....

Your query would then become something like:

--SAMPLE DATA
WITH Customers AS
(   SELECT  *
    FROM    (VALUES
                (1, 'a', 'b'),
                (2, 'c', 'a'),
                (3, 'd', 'e')
            ) AS t (cust_id, firstname, lastname)
), CustomerValues AS
(   SELECT  *
    FROM    (VALUES
                (1, 'ct'), (1, 'ct'), (1, 'ct'), (1, 'dir'),
                (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'), (2, 'ct'),
                (3, 'ct'), (3, 'ct'), (3, 'ct'), (3, 'dir'), (3, 'st')
            ) AS t (cust_id, value)
)
-- SAMPLE DATA END
SELECT  c.cust_id,
        c.firstname,
        c.lastname,
        value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM    Customers AS c
        CROSS APPLY
        (   SELECT  DISTINCT ';' + value
            FROM    CustomerValues AS cv
            WHERE   cv.cust_id = c.cust_id
            FOR XML PATH(''), TYPE
        ) AS cv (CustomerValues);

For more reading on how the rows are concatenated see Grouped Concatenation in SQL Server

Without your data in this format, you would need to perform some kind of split. For more see Split strings the right way – or the next best way

WITH Customers AS
(   SELECT  *
    FROM    (VALUES
                (1, 'a', 'b', 'ct;ct;ct;dir'),
                (2, 'c', 'a', 'ct;ct;ct;ct;ct;ct'),
                (3, 'd', 'e', 'ct;ct;ct;dir;st')
            ) AS t (cust_id, firstname, lastname, value)
), Numbers (Number) AS
(   SELECT  ROW_NUMBER() OVER(ORDER BY N1.N)
    FROM    (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n1 (N)
    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n2 (N)
    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS n3 (N)
), CustomerValues AS
(   SELECT  c.cust_id,
            value = SUBSTRING(c.value, Number,  CHARINDEX(';', c.value + ';', n.Number) - n.Number)
    FROM    Customers AS c
            INNER JOIN Numbers AS n
                ON N.Number <= CONVERT(INT, LEN(c.value))
                AND SUBSTRING(';' + c.value, n.Number, 1) = ';'
)
SELECT  c.cust_id,
        c.firstname,
        c.lastname,
        value = STUFF(CustomerValues.value('.', 'NVARCHAR(MAX)'), 1, 1, '')
FROM    Customers AS c
        CROSS APPLY
        (   SELECT  DISTINCT ';' + value
            FROM    CustomerValues AS cv
            WHERE   cv.cust_id = c.cust_id
            FOR XML PATH(''), TYPE
        ) AS cv (CustomerValues);
GarethD
  • 68,045
  • 10
  • 83
  • 123
0

Here is how you can do:

WITH 
CTE_Sample AS
(
    SELECT 1 AS cust_id, 'a' AS firstname, 'b' AS lastname, 'ct;ct;ct;dir'      AS YourValue UNION ALL
    SELECT 2 AS cust_id, 'c' AS firstname, 'a' AS lastname, 'ct;ct;ct;ct;ct;ct' AS YourValue UNION ALL
    SELECT 3 AS cust_id, 'd' AS firstname, 'e' AS lastname, 'ct;ct;ct;dir;st'   AS YourValue 
),

--
-- Split your values into lines (Distinct values)
CTE_Split AS
(
    SELECT DISTINCT
             YourValue  
            ,value AS Val
          FROM CTE_Sample SS
          CROSS APPLY STRING_SPLIT(YourValue, ';')
)


SELECT 
     cust_id
    ,firstname
    ,lastname
    -- Aggregate your different value into one column
    ,STUFF((
        SELECT ';'+ Val
          FROM CTE_Split SP       
          WHERE SP.YourValue = SA.YourValue
        FOR XML PATH('')
                        ), 1, 1, '' 
    ) AS Val
 FROM CTE_Sample SA