1

I have this query:

select ' ' C union
select '*' C union
select '-' C
order by C

And the result is space, asterisk and dash, but if I have unicode characters like this:

select N' ' C union
select N'*' C union
select N'-' C
order by C

I get space, dash, asterisk. Can anyone explain why?

Thanks!

Khan
  • 17,904
  • 5
  • 47
  • 59
vgaspar.travix
  • 326
  • 1
  • 9

1 Answers1

3

I thought at first that this would be down to different default collations for varchar and nvarchar, but that doesn't seem to be the case. It does seem to vary a bit by collation (I don't see it with Latin1_General_CI_AS but I do if I use SQL_Latin1_General_CP1_CI_AS).

Looking into it further, I found this answer here on Stack Overflow, which references this article on MSDN, which has this to say about hyphens and Unicode:

A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.

So I've marked this answer a CW, as this is really just a duplicate of that answer (and the question a a duplicate of that question).

Community
  • 1
  • 1
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875