It seems that SQL Server 2008 removes some unicode characters when comparing two strings. Consider the following table:
CREATE TABLE [dbo].[Test](
[text] [nvarchar](50) NOT NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[text] ASC
))
And now if I insert some rows with unicode characters:
insert into Test values(N'it᧠')
insert into Test values(N'it')
I get a unique constraint exception, even though the values are different. I'm using the default database collation here, which is SQL_Latin1_General_CP1_CI_AS.
Violation of PRIMARY KEY constraint 'PK_Test'. Cannot insert duplicate key in object 'dbo.Test'.
Note, that this doesn't happen for all unicode characters, but only for some characters, but I haven't been able to identify which unicode ranges exactly are problematic. For instance, the dingbat 0x2757 (❗) is removed in comparison, but 0x2764 (♥) is not. I guess it has something to do with 0x2757 being from a newer unicode standard.
So the question is, is there any way to make SQL Server 2008 work with these characters, or alternatively, can I programatically detect them (in C#, via unicode ranges or some such) and remove them beforehand?