I thought at first that this would be down to different default collations for varchar
and nvarchar
, but that doesn't seem to be the case. It does seem to vary a bit by collation (I don't see it with Latin1_General_CI_AS
but I do if I use SQL_Latin1_General_CP1_CI_AS
).
Looking into it further, I found this answer here on Stack Overflow, which references this article on MSDN, which has this to say about hyphens and Unicode:
A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.
So I've marked this answer a CW, as this is really just a duplicate of that answer (and the question a a duplicate of that question).