4

I've run into a problem in a project I'm working on: some of the string values in a specific SQL Server 2008 table column contain Unicode characters. For example, instead of a dash some strings will instead contain an EM DASH (http://www.fileformat.info/info/unicode/char/2014/index.htm).

The column values that contain Unicode characters are causing problems when I send HTTP requests to a third-party server. Is there a way to query what rows contain one-or-more Unicode characters, so I can at least begin to identify how many rows need to be fixed?

Mass Dot Net
  • 2,150
  • 9
  • 38
  • 50
  • 1
    Well if your data type is unicode then all of the characters will be Unicode. Do you want to find all values which can't be round tripped to single byte in your default collation without data loss? – Martin Smith Mar 03 '11 at 17:54
  • I think single-byte items may still include invisible control characters that I don't care about. It would be more accurate to say that I want to find all strings that contain one or more characters outside ASCII characters 32-126. – Mass Dot Net Mar 09 '11 at 19:47

3 Answers3

18

You want to find all strings that contain one or more characters outside ASCII characters 32-126.

I think this should do the job.

SELECT *
FROM your_table
WHERE your_column LIKE N'%[^ -~]%' collate Latin1_General_BIN
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
2

One way you can do it is to see which rows no longer equal themselves when converted to a datatype that doesn't support unicode.

CREATE TABLE myStrings (
    string nvarchar(max) not null
)

INSERT INTO myStrings (string)
SELECT 'This is not unicode' union all
SELECT 'This has '+nchar(500)+' unicode' union all
SELECT 'This also does not have unicode' union all
SELECT 'This has lots of unicode '+nchar(600)+nchar(700)+nchar(800)+'!'

SELECT cast(string as varchar)
FROM myStrings

SELECT *
FROM myStrings
WHERE cast(cast(string as varchar(max)) as nvarchar(max)) <> string
Aaron Silverman
  • 22,070
  • 21
  • 83
  • 103
  • what did your results look like? I tried something similar and it failed. I think it had something to do with DASH and HYPHEN – Richard Mar 23 '17 at 19:34
-1
SELECT *
FROM your_table
WHERE your_column LIKE N'%[^ -~]%' collate Latin1_General_BIN

finds all strings that contain one or more characters within ASCII characters 32-126.

I thought the purpose was to find strings where ASCII characters are not in the range 32-126?

NOT is possible with LIKE. Wouldn't this work?

SELECT *
FROM your_table
WHERE your_column NOT LIKE N'%[^ -~]%'

No collate required.

user2466387
  • 59
  • 1
  • 5