43

I was actually asked this myself a few weeks ago, whereas I know exactly how to do this with a SP or UDF but I was wondering if there was a quick and easy way of doing this without these methods. I'm assuming that there is and I just can't find it.

A point I need to make is that although we know what characters are allowed (a-z, A-Z, 0-9) we don't want to specify what is not allowed (#@!$ etc...). Also, we want to pull the rows which have the illegal characters so that it can be listed to the user to fix (as we have no control over the input process we can't do anything at that point).

I have looked through SO and Google previously, but was unable to find anything that did what I wanted. I have seen many examples which can tell you if it contains alphanumeric characters, or doesn't, but something that is able to pull out an apostrophe in a sentence I have not found in query form.

Please note also that values can be null or '' (empty) in this varchar column.

Tim S. Van Haren
  • 8,861
  • 2
  • 30
  • 34
Jay
  • 4,240
  • 3
  • 26
  • 39
  • 2
    What version of SQL Server? This sounds like a regex job – OMG Ponies Dec 17 '09 at 06:04
  • @OMG Ponies I have updated the subject, it's more of a "I wonder how you could do this" rather than needed for a specific system, but simplicities sake lets just say it's 2008. – Jay Dec 17 '09 at 06:22

4 Answers4

78

Won't this do it?

SELECT * FROM TABLE
WHERE COLUMN_NAME LIKE '%[^a-zA-Z0-9]%'

Setup

use tempdb
create table mytable ( mycol varchar(40) NULL)

insert into mytable VALUES ('abcd')
insert into mytable VALUES ('ABCD')
insert into mytable VALUES ('1234')
insert into mytable VALUES ('efg%^&hji')
insert into mytable VALUES (NULL)
insert into mytable VALUES ('')
insert into mytable VALUES ('apostrophe '' in a sentence') 

SELECT * FROM mytable
WHERE mycol LIKE '%[^a-zA-Z0-9]%'

drop table mytable 

Results

mycol
----------------------------------------
efg%^&hji
apostrophe ' in a sentence
beach
  • 8,330
  • 3
  • 29
  • 25
  • LIKE doesn't support regexes, only the wildcard. And we don't know what version currently.. – OMG Ponies Dec 17 '09 at 06:09
  • We don't need a RegEx to solve this problem. Unless I am missing something. – beach Dec 17 '09 at 06:12
  • Let me rephrase - we don't need full RegEx support. Using the simple pattern matching that LIKE supports (or PATINDEX) should suffice. We only need to find a single character that is not in the allowed list. – beach Dec 17 '09 at 06:17
  • Could have sworn I did this at some point, I kind of assumed that because '' does not contain any alphanumeric characters that would be returned also. Thanks heaps, that's the answer :) – Jay Dec 17 '09 at 06:29
  • 1
    For me, my database is not case sensitive, so `COLUMN_NAME LIKE '%[^a-zA-Z0-9]%'` is the same as `COLUMN_NAME LIKE '%[^A-Z0-9]%'` or `COLUMN_NAME LIKE '%[^a-z0-9]%'` – Jaider Aug 16 '12 at 20:19
  • Work for me. Thanx. – TheZodchiy Sep 07 '16 at 13:00
10

Sql server has very limited Regex support. You can use PATINDEX with something like this

PATINDEX('%[a-zA-Z0-9]%',Col)

Have a look at PATINDEX (Transact-SQL)

and Pattern Matching in Search Conditions

Adriaan Stander
  • 162,879
  • 31
  • 289
  • 284
  • 6
    If you want to use PATINDEX, you need to do something like this: "SELECT * FROM mytable WHERE PATINDEX('%[^a-zA-Z0-9]%',mycol) > 1" – beach Dec 17 '09 at 06:19
  • 1
    Yes, this is correct. Selecting it in the column list will only show you where the values are. – Adriaan Stander Dec 17 '09 at 06:23
  • This is how I used it to delete rows with non latin caracters in a specific column: delete FROM [DefaultConnection].[dbo].[AspNetUsers] where PATINDEX('%[a-zA-Z0-9]%',country) = 0 – Ronen Festinger Jul 22 '16 at 18:05
3

I found this page with quite a neat solution. What makes it great is that you get an indication of what the character is and where it is. Then it gives a super simple way to fix it (which can be combined and built into a piece of driver code to scale up it's application).

DECLARE @tablename VARCHAR(1000) ='Schema.Table'
DECLARE @columnname VARCHAR(100)='ColumnName'
DECLARE @counter INT = 0
DECLARE @sql VARCHAR(MAX)

WHILE @counter <=255
BEGIN

SET @sql=

'SELECT TOP 10 '+@columnname+','+CAST(@counter AS VARCHAR(3))+' as CharacterSet, CHARINDEX(CHAR('+CAST(@counter AS VARCHAR(3))+'),'+@columnname+') as LocationOfChar
FROM '+@tablename+'
WHERE CHARINDEX(CHAR('+CAST(@counter AS VARCHAR(3))+'),'+@columnname+') <> 0'

PRINT (@sql)
EXEC (@sql)
SET @counter = @counter + 1
END

and then...

UPDATE Schema.Table
SET ColumnName= REPLACE(Columnname,CHAR(13),'')

Credit to Ayman El-Ghazali.

wwmbes
  • 301
  • 2
  • 4
2
SELECT * FROM TABLE_NAME WHERE COL_NAME LIKE '%[^0-9a-zA-Z $@$.$-$''''$,]%'

This works best for me when I'm trying to find any special characters in a string

Obsidian
  • 3,719
  • 8
  • 17
  • 30
Ayme
  • 21
  • 1