-1

I have a string say 'Hel@1*oO'

Input string -- Hel@1*oO

I want to create a function that will parse through the string 'Hel@1*oO' and replace all characters other than alphanumeric with #.

Basically I want to use regex as [^A-Za-z0-9]. So that other than these characters everything will be replaced with #

The Output will be -- Hel#1#oO

We have REGEX_REPLACE() in Oracle that does the same functionality but I need to get this functionality in SQL Server.

What set of functions can be used to achieve this.

Thanks for the help!

Liam
  • 51
  • 7
  • Like this: https://stackoverflow.com/questions/21378193/regex-pattern-inside-sql-replace-function OR https://www.sqlservercentral.com/blogs/regex-in-sql-server-for-replacing-text ? – xQbert Mar 29 '22 at 13:01
  • Could you share what version of SQL Server you're using? (And that's `SELECT @@VERSION;`, not `Help > About` in Management Studio.) – Aaron Bertrand Mar 29 '22 at 13:09

1 Answers1

1

As you may have found out, T-SQL has no Regex support and thus no support for Regex replacement. You can achieve Regex support with CLR functions if needed, however, I'm not going to cover that here as there are a wealth of resources out there for that already if you want to go down that route.

Assuming, however, you are on a fully supported version of SQL Server, you can use a Tally to break the string into individual characters, and then reaggregate the string with STRING_AGG (if you aren't on a fully supported version, you'll need to use the "old" FOR XML PATH method).

This gives you something like this:

DECLARE @String nvarchar(4000) = N'Hel@1*oO',
        @Pattern nvarchar(100) = N'[^A-Za-z0-9]',
        @ReplacementCharacter nvarchar(1) = '#';

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
    SELECT TOP(LEN(@String))
           ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1, N N2, N N3, N N4)
SELECT STRING_AGG(CASE WHEN V.C LIKE @Pattern THEN @ReplacementCharacter ELSE V.C END,'') WITHIN GROUP (ORDER BY T.I)
FROM Tally T
     CROSS APPLY (VALUES(SUBSTRING(@String,T.I,1)))V(C);

If you wanted to, you could convert this to an inline table-value function, and then use that against a column (or value):

CREATE OR ALTER FUNCTION dbo.PatternCharacterReplace (@String nvarchar(4000), @Pattern nvarchar(100), @ReplacementCharacter nvarchar(1)) 
RETURNS table
AS RETURN
    WITH N AS(
        SELECT N
        FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
    Tally AS(
        SELECT TOP(LEN(@String))
               ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
        FROM N N1, N N2, N N3, N N4)  --4096 rows; For a varchar(8000) or MAX you would need more rows for such lengths
    SELECT STRING_AGG(CASE WHEN V.C LIKE @Pattern THEN @ReplacementCharacter ELSE V.C END,'') WITHIN GROUP (ORDER BY T.I) AS ReplacedString
    FROM Tally T
         CROSS APPLY (VALUES(SUBSTRING(@String,T.I,1)))V(C);
GO

SELECT *
FROM (VALUES(N'Hel@1*oO'),('H0w 4re y0u? :)'))V(S)
     CROSS APPLY dbo.PatternCharacterReplace(V.S,N'[^A-Za-z0-9]',N'#') PCR;

Note that for the function, you may need to create multiple versions for nvarchar and varchar (and possibly explicitly ones for MAX length ones too)

Again, as mentioned, if you need true Regex replacement functionality, you'll need to look into CLR or do the operation outside of SQL Server.

Thom A
  • 88,727
  • 11
  • 45
  • 75