14

Any one know a good way to remove punctuation from a field in SQL Server?

I'm thinking

UPDATE tblMyTable SET FieldName = REPLACE(REPLACE(REPLACE(FieldName,',',''),'.',''),'''' ,'')

but it seems a bit tedious when I intend on removing a large number of different characters for example: !@#$%^&*()<>:"

Thanks in advance

Ev.
  • 7,109
  • 14
  • 53
  • 87

8 Answers8

18

Ideally, you would do this in an application language such as C# + LINQ as mentioned above.

If you wanted to do it purely in T-SQL though, one way make things neater would be to firstly create a table that held all the punctuation you wanted to removed.

CREATE TABLE Punctuation 
(
    Symbol VARCHAR(1) NOT NULL
)

INSERT INTO Punctuation (Symbol) VALUES('''')
INSERT INTO Punctuation (Symbol) VALUES('-')
INSERT INTO Punctuation (Symbol) VALUES('.')

Next, you could create a function in SQL to remove all the punctuation symbols from an input string.

CREATE FUNCTION dbo.fn_RemovePunctuation
(
    @InputString VARCHAR(500)
)
RETURNS VARCHAR(500)
AS
BEGIN
    SELECT
        @InputString = REPLACE(@InputString, P.Symbol, '')
    FROM 
        Punctuation P

    RETURN @InputString
END
GO

Then you can just call the function in your UPDATE statement

UPDATE tblMyTable SET FieldName = dbo.fn_RemovePunctuation(FieldName)
Tim C
  • 70,053
  • 14
  • 74
  • 93
  • @TimC: Thanks a lot I liked this solution and implemented it, but came up with a problem. Once I take the punctuation out I intend on using it in a computed column that is persisted so I can create a Full Text Index on it. However I can't persist the column that uses this method. Any ideas? – Ev. Dec 03 '09 at 00:53
12

I wanted to avoid creating a table and wanted to remove everything except letters and digits.

DECLARE @p int
DECLARE @Result Varchar(250)
DECLARE @BadChars Varchar(12)
SELECT @BadChars = '%[^a-z0-9]%'
-- to leave spaces - SELECT @BadChars = '%[^a-z0-9] %'

SET @Result = @InStr

SET @P =PatIndex(@BadChars,@Result)
WHILE @p > 0 BEGIN
    SELECT @Result = Left(@Result,@p-1) + Substring(@Result,@p+1,250)
    SET @P =PatIndex(@BadChars,@Result)
    END
Ken Mc
  • 121
  • 1
  • 2
  • I think that the commented line to leave spaces would be `DECLARE @BadChars VARCHAR(13) = '%[^ a-z0-9]%'`. – Guile May 04 '21 at 18:28
7

I am proposing 2 solutions

Solution 1: Make a noise table and replace the noises with blank spaces

e.g.

DECLARE @String VARCHAR(MAX)
DECLARE @Noise TABLE(Noise VARCHAR(100),ReplaceChars VARCHAR(10))
SET @String = 'hello! how * > are % u (: . I am ok :). Oh nice!'

INSERT INTO @Noise(Noise,ReplaceChars)
SELECT '!',SPACE(1) UNION ALL SELECT '@',SPACE(1) UNION ALL
SELECT '#',SPACE(1) UNION ALL SELECT '$',SPACE(1) UNION ALL
SELECT '%',SPACE(1) UNION ALL SELECT '^',SPACE(1) UNION ALL
SELECT '&',SPACE(1) UNION ALL SELECT '*',SPACE(1) UNION ALL
SELECT '(',SPACE(1) UNION ALL SELECT ')',SPACE(1) UNION ALL
SELECT '{',SPACE(1) UNION ALL SELECT '}',SPACE(1) UNION ALL
SELECT '<',SPACE(1) UNION ALL SELECT '>',SPACE(1) UNION ALL
SELECT ':',SPACE(1)

SELECT @String = REPLACE(@String, Noise, ReplaceChars) FROM @Noise
SELECT @String Data

Solution 2: With a number table

DECLARE @String VARCHAR(MAX)
SET @String = 'hello! & how * > are % u (: . I am ok :). Oh nice!'

;with numbercte as
(
 select 1 as rn
 union all
 select rn+1 from numbercte where rn<LEN(@String)
)
select REPLACE(FilteredData,'&#x20;',SPACE(1)) Data from 
(select SUBSTRING(@String,rn,1) 
from numbercte  
where SUBSTRING(@String,rn,1) not in('!','*','>','<','%','(',')',':','!','&','@','#','$')

for xml path(''))X(FilteredData)

Output(Both the cases)

Data

hello  how   are  u  . I am ok . Oh nice

Note- I have just put some of the noises. You may need to put the noises that u need.

Hope this helps

priyanka.sarkar
  • 25,766
  • 43
  • 127
  • 173
  • @pewned. Thanks a lot. I ended up using a different idea, but this worked and answered my question. One question for you though, I need to use this data in a computed column that is persisted, but I can't persist it there. Any ideas? I'll probably make a new question for this. Thanks again. – Ev. Dec 03 '09 at 00:58
3

You can use regular expressions in SQL Server - here is an article based on SQL 2005:

http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Fenton
  • 241,084
  • 71
  • 387
  • 401
  • It may be that CLR can't be used in your environment – gbn Nov 30 '09 at 10:16
  • @Sohnee - thanks for the fast reply. I've not tried this method yet, because I need to use this data in a Full Text Index, and I'm thinking if I use this I won't be able to make the column persistent. Is that correct? Although I'm definitely going to try something like that next time its' feasible. Thanks. – Ev. Dec 03 '09 at 00:51
1

I'd wrap it in a simple scalar UDF so all string cleaning is in one place if it's needed again.

Then you can use it on INSERT too...

gbn
  • 422,506
  • 82
  • 585
  • 676
1

I took Ken MC's solution and made it into an function which can replace all punctuation with a given string:

----------------------------------------------------------------------------------------------------------------
-- This function replaces all punctuation in the given string with the "replaceWith" string
----------------------------------------------------------------------------------------------------------------
IF object_id('[dbo].[fnReplacePunctuation]') IS NOT NULL
BEGIN
    DROP FUNCTION [dbo].[fnReplacePunctuation];
END;
GO
CREATE FUNCTION [dbo].[fnReplacePunctuation] (@string NVARCHAR(MAX), @replaceWith NVARCHAR(max))
RETURNS NVARCHAR(MAX)
BEGIN
    DECLARE @Result Varchar(max) = @string;
    DECLARE @BadChars Varchar(12) = '%[^a-z0-9]%'; -- to leave spaces - SELECT @BadChars = '%[^a-z0-9] %'
    DECLARE @p int = PatIndex(@BadChars,@Result);
    DECLARE @searchFrom INT;
    DECLARE @indexOfPunct INT = @p;

    WHILE @indexOfPunct > 0 BEGIN
      SET @searchFrom = LEN(@Result) - @p;
      SET @Result = Left(@Result, @p-1) + @replaceWith + Substring(@Result, @p+1,LEN(@Result));
      SET @IndexOfPunct = PatIndex(@BadChars, substring(@Result, (LEN(@Result) - @SearchFrom)+1, LEN(@Result)));
      SET @p = (LEN(@Result) - @searchFrom) + @indexOfPunct;
    END
    RETURN @Result;
END;
GO
-- example:
SELECT dbo.fnReplacePunctuation('This is, only, a tést-really..', '');

Output:

Thisisonlyatéstreally
0

If it's a one-off thing, I would use a C# + LINQ snippet in LINQPad to do the job with regular expressions.

It is quick and easy and you don't have to go through the process of setting up a CLR stored procedure and then cleaning up after yourself.

Tiberiu Ana
  • 3,663
  • 1
  • 24
  • 25
  • Thanks a lot, it's not a one of thing in this case, but that's a good looking tool for me to use in the future. Cheers. – Ev. Nov 30 '09 at 23:00
0

Can't you use PATINDEX to only include NUMBERS and LETTERS instead of trying to guess what punctuation might be in the field? (Not trying to be snarky, if I had the code ready, I'd share it...but this is what I'm looking for).

Seems like you need to create a custom function in order to avoid a giant list of replace functions in your queries - here's a good example:

http://www.codeproject.com/KB/database/SQLPhoneNumbersPart_2.aspx?display=Print

mahalie
  • 2,647
  • 20
  • 21