0

Suppose i have the following regex pattern of:

%[a-zA-Z0-9+&@#\/%=~_|$?!-:,.']%

How could I iterate through a strings characters and remove (or replace with a blank) any characters that don't match the pattern?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
rmon2852
  • 220
  • 1
  • 7
  • 19

3 Answers3

3

This:

PatIndex('%[^a-zA-Z0-9+&@#\/%=~_|$?!-:,.']%', YourValue)

will return the character at which the pattern matches. In this case, I've added ^ to the beginning so that the pattern matches everything not in the character set.

You can then remove the character at that position, and continue, or replace all occurrences of the found character in the entire string.

FYI: to simulate the offset parameter of CharIndex in order to search starting at a certain character position, you can use Substring to get a portion of the string (or even one character) and use PatIndex on that.

ErikE
  • 48,881
  • 23
  • 151
  • 196
  • So using this I would have to wrap it in a while loop to factor in the scenario where more than one invalid character could be in the string correct? – rmon2852 Jan 13 '16 at 16:43
  • Correct. There's no way to get some kind of map of the list of characters that don't match the pattern. You get to find out about one at a time. – ErikE Jan 13 '16 at 17:30
3

Ok based on ErikE's help I made the following function

     CREATE FUNCTION dbo.RemoveInvalidCharacters
 (
@pattern as varchar(max),
@string as varchar(max)
)
RETURNS varchar(MAX)
AS
BEGIN

    WHILE PatIndex(@pattern,@string) >0
      BEGIN
      SET @string=REPLACE(@string,SUBSTRING(@string,PatIndex(@pattern,@string),1),'')
END
RETURN @String

END
GO   

Then all I do was call the function as:

  select dbo.RemoveInvalidCharacters('%[^a-zA-Z0-9+&@#\/%=~_|$?!:,.''-]%','Test€€€String^^^')

Then the output is: TestString

rmon2852
  • 220
  • 1
  • 7
  • 19
0

As ErikE mentions in his answer, the PATINDEX method is what you need, it is SLOW though.

An answer to a very similar question on SO is stackoverflow.com/a/23001916/4316831. The below code was taken from there:

DECLARE @counter int

SET @counter = 0

WHILE(@counter < (SELECT MAX(ID_COLUMN) FROM Table))
BEGIN  

    WHILE 1 = 1
    BEGIN
        DECLARE @RetVal varchar(50)

        SET @RetVal =  (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
        FROM Table
        WHERE ID_COLUMN = @counter)

        IF(@RetVal IS NOT NULL)       
          UPDATE Table SET
          Column = @RetVal
          WHERE ID_COLUMN = @counter
        ELSE
            break
    END

    SET @counter = @counter + 1
END
Community
  • 1
  • 1
Owain Williams
  • 2,387
  • 17
  • 22
  • This is not much more than a link-only answer, which is frowned on. Could you expand on this a bit? – ErikE Jan 13 '16 at 17:31
  • Good shout @ErikE, I wasn't sure of the etiquette of pasting someone else's code block, but I suppose if I credit it correctly it's okay. – Owain Williams Jan 13 '16 at 18:07
  • Now that I think about it, perhaps it's better to flag this question as a duplicate of that one instead of just repeating the answer... ? – ErikE Jan 13 '16 at 18:11
  • I don't have a high enough reputation to do that, hence the link to the answer. – Owain Williams Jan 13 '16 at 18:16