0

I have a table with the following columns:

dbo.SomeInfo
  - Id
  - Name
  - InfoCode

Now I need to update the above table's InfoCode as

Update  dbo.SomeInfo
Set InfoCode= REPLACE(Replace(RTRIM(LOWER(Name)),' ','-'),':','')

This replaces all spaces with - & lowercase the name

When I do check the InfoCode, I see there are Names with some special characters like

Cathe Friedrich''s Low Impact
coffeyfit-cardio-box-&-burn
Jillian Michaels: Cardio

Then I am manually writing the update sql against this as

 Update dbo.SomeInfo
 SET InfoCode= 'cathe-friedrichs-low-impact'
 where Name ='Cathe Friedrich''s Low Impact '

Now, this solution is not realistic for me. I checked the following links related to Regex & others around it.

But none of them is hitting the requirement.

What I need is if there is any character other [a-z0-9] replace it - & also there should not be continuous -- in InfoCode

The above Update sql has set some values of InfoCode as the-dancer's-workout®----starter-package

Some Names have value as

Sleek Technique™
The Dancer's-workout®

How can I write Update sql that could handle all such special characters?

Kgn-web
  • 7,047
  • 24
  • 95
  • 161

3 Answers3

1

Using NGrams8K you could split the string into characters and then rather than replacing every non-acceptable character, retain only certain ones:

SELECT (SELECT '' + CASE WHEN N.token COLLATE Latin1_General_BIN LIKE '[A-z0-9]'THEN token ELSE '-' END
        FROM dbo.NGrams8k(V.S,1) N
        ORDER BY position
        FOR XML PATH(''))
FROM (VALUES('Sleek Technique™'),('The Dancer''s-workout®'))V(S);

I use COLLATE here as on my default collation in my instance the '™' is ignored, therefore I use a binary collation. You may want to use COLLATE to switch the string back to its original collation outside of the subquery.

Thom A
  • 88,727
  • 11
  • 45
  • 75
1

This approach is fully inlinable:

First we need a mock-up table with some test data:

DECLARe @SomeInfo TABLE (Id INT IDENTITY, InfoCode VARCHAR(100));
INSERT INTO @SomeInfo (InfoCode) VALUES
 ('Cathe Friedrich''s Low Impact')
,('coffeyfit-cardio-box-&-burn')
,('Jillian Michaels: Cardio')
,('Sleek Technique™')
,('The Dancer''s-workout®');

--This is the query

WITH cte AS
(
    SELECT 1 AS position
          ,si.Id
          ,LOWER(si.InfoCode) AS SourceText
          ,SUBSTRING(LOWER(si.InfoCode),1,1) AS OneChar
    FROM @SomeInfo si

    UNION ALL

    SELECT cte.position +1
          ,cte.Id
          ,cte.SourceText
          ,SUBSTRING(LOWER(cte.SourceText),cte.position+1,1) AS OneChar
    FROM cte
    WHERE position < DATALENGTH(SourceText)
)
,Cleaned AS
(
    SELECT cte.Id
          ,(
            SELECT CASE WHEN ASCII(cte2.OneChar) BETWEEN 65 AND 90 --A-Z
                          OR ASCII(cte2.OneChar) BETWEEN 97 AND 122--a-z
                          OR ASCII(cte2.OneChar) BETWEEN 48 AND 57 --0-9
                          --You can easily add more ranges
                        THEN cte2.OneChar ELSE '-' 
                          --You can easily nest another CASE to deal with special characters like the single quote in your examples... 
                   END 
            FROM cte AS cte2
            WHERE cte2.Id=cte.Id
            ORDER BY cte2.position
            FOR XML PATH('')
           ) AS normalised
    FROM cte
    GROUP BY cte.Id
)
,NoDoubleHyphens AS
(
    SELECT REPLACE(REPLACE(REPLACE(normalised,'-','<>'),'><',''),'<>','-') AS normalised2
    FROM Cleaned
)
SELECT CASE WHEN RIGHT(normalised2,1)='-' THEN SUBSTRING(normalised2,1,LEN(normalised2)-1) ELSE normalised2 END AS FinalResult 
FROM NoDoubleHyphens;

The first CTE will recursively (well, rather iteratively) travers down the string, character by character and a return a very slim set with one row per character.

The second CTE will then GROUP the Ids. This allows for a correlated sub-query, where the actual check is performed using ASCII-ranges. FOR XML PATH('') is used to re-concatenate the string. With SQL-Server 2017+ I'd suggest to use STRING_AGG() instead.

The third CTE will use a well known trick to get rid of multiple occurances of a character. Take any two characters which will never occur in your string, I use < and >. A string like a--b---c will come back as a<><>b<><><>c. After replacing >< with nothing we get a<>b<>c. Well, that's it...

The final SELECT will cut away a trailing hyphen. If needed you can add similar logic to get rid of a leading hyphen. With v2017+ There was TRIM('-') to make this easier...

The result

cathe-friedrich-s-low-impact
coffeyfit-cardio-box-burn
jillian-michaels-cardio
sleek-technique
the-dancer-s-workout
Shnugo
  • 66,100
  • 9
  • 53
  • 114
0

You can create a User-Defined-Function for something like that.

Then use the UDF in the update.

CREATE FUNCTION [dbo].LowerDashString (@str varchar(255))
RETURNS varchar(255)
AS
BEGIN
    DECLARE @result varchar(255);
    DECLARE @chr varchar(1);
    DECLARE @pos int;
    SET @result = '';
    SET @pos = 1;

    -- lowercase the input and remove the single-quotes
    SET @str = REPLACE(LOWER(@str),'''','');

    -- loop through the characters 
    -- while replacing anything that's not a letter to a dash
    WHILE @pos <= LEN(@str)
    BEGIN

      SET @chr = SUBSTRING(@str, @pos, 1)

      IF @chr LIKE '[a-z]' SET @result += @chr;
      ELSE SET @result += '-';

      SET @pos += 1;
    END;

    -- SET @result = TRIM('-' FROM @result); -- SqlServer 2017 and beyond

    -- multiple dashes to one dash
    WHILE @result LIKE '%--%' SET @result = REPLACE(@result,'--','-');

    RETURN @result;
END;
GO

Example snippet using the function:

-- using a table variable for demonstration purposes
declare @SomeInfo table (Id int primary key identity(1,1) not null, InfoCode varchar(100) not null);

-- sample data
insert into @SomeInfo (InfoCode) values
('Cathe Friedrich''s Low Impact'),
('coffeyfit-cardio-box-&-burn'),
('Jillian Michaels: Cardio'),
('Sleek Technique™'),
('The Dancer''s-workout®');

update @SomeInfo
set InfoCode = dbo.LowerDashString(InfoCode)
where (InfoCode LIKE '%[^A-Z-]%' OR InfoCode != LOWER(InfoCode));

select * 
from @SomeInfo;

Result:

Id  InfoCode
--  -----------------------------
1   cathe-friedrichs-low-impact
2   coffeyfit-cardio-box-burn
3   jillian-michaels-cardio
4   sleek-technique-
5   the-dancers-workout-
LukStorms
  • 28,916
  • 5
  • 31
  • 45