2

I'm trying to process a string so that only uppercase letters and digits are taken. Trying to use PATINDEX because I want to keep this concise - it will be part of a bigger query.

There are quite a few examples out there showing something similar, however I seem to be missing something. So far I've got

DECLARE @string varchar(100)

SET @string = 'AbcDef12-Ghi'

WHILE PATINDEX('%[^A-Z0-9]%',@string) <> 0
    SET @string = STUFF(@string,PATINDEX('%[^A-Z0-9]%',@string),1,'')

SELECT @string

As per e.g. this question, however this doesn't seem to be doing what I need.

Using [^0-9] outputs 12, which is correct.

However [^A-Z] outputs AbcDefGhi, i.e. both uppercase and lowercase. Doesn't matter if I use [^A-Z] or [^a-z], both return all letters.

I think I may be missing something simple? I found a few suggestions referring to collation and tried to use COLLATE with PATINDEX as per this but couldn't get it to work.

Note: I'm doing this as a one-off query on a SQL Server database to find some data - this will not be reused anywhere else so things like performance, SQL injection (mentioned quite frequently in this context) etc. are not a concern.

Community
  • 1
  • 1
Boris
  • 1,180
  • 1
  • 18
  • 29
  • Maybe you could use tally table + substring for 1 character and compare that to allowed characters? Something similar that DelimitedSplit8k uses http://www.sqlservercentral.com/articles/Tally+Table/72993/ – James Z Feb 18 '15 at 17:29
  • @Blam thanks - I did more research on this. JamesZ, the solution below works for me plus I'm not registered at SQL Server Central. – Boris Feb 19 '15 at 08:45

1 Answers1

7

You can do this using COLLATE and the a case sensitive collation such as Latin1_General_BIN:

DECLARE @string varchar(100)

SET @string = 'AbcDef12-Ghi'

WHILE PATINDEX('%[^A-Z0-9]%',@string COLLATE Latin1_General_BIN) <> 0
BEGIN
    SET @string = STUFF(
        @string,
        PATINDEX('%[^A-Z0-9]%',@string COLLATE Latin1_General_BIN),1,''
    )
END

SELECT @string

This will output:

AD12G

DavidG
  • 113,891
  • 12
  • 217
  • 223
  • Thanks for your help, this works! Weird, I remember trying this exactly with Latin1_General_BIN yesterday but maybe it only works if it's not 5pm :P – Boris Feb 19 '15 at 08:41
  • The logic in general is clear and sound, but I would use a variable to store the result of PATINDEX to avoid calling the function twice at every iteration. – Andriy M Feb 19 '15 at 17:46
  • 1
    @AndriyM You are correct and I would normally do that, but OP stated in the question that this is a one-off and performance is not an issue. – DavidG Feb 19 '15 at 17:47