2

I am using C# and .Net 4.0 with MS SQL 2008.

I am running an integration test to verify that data is getting correctly stored and retrieved. It fails more often than not. When I look into it I see that I am getting the wrong value back from the linq-to-sql call. I have profiled the linq-to-sql statement and discovered that in Server Management Studio, the profiled SQL returns the wrong value, while a hand typed query with the same parameters works correctly.

The linq-to-sql query and result:

exec sp_executesql N'SELECT TOP (1) [t0].[ID], [t0].[UserName], [t0].TCID
FROM [dbo].[Users] AS [t0]
WHERE ([t0].[TCID] = @p0) AND ([t0].[UserName] = @p1)',N'@p0 int,@p1
nvarchar(4000)',@p0=8,@p1=N'ҭРӱґѻ'

Results in

ID        UserName    TCID
2535      ҭРґѻӱ       8

As you can see, UserName does not match what was in the equality check.

If I do this, I get the expected result:

SELECT TOP 1000 [ID]
    ,[UserName]
    ,[TCID]
FROM [dbo].[Users]
where TCID=8 and username = 'ҭРӱґѻ'

I get back:

ID        UserName    TCID

Which is correct.

UserName is nvarchar(50), ID and TCID are int.

Any ideas why the first query gets the wrong result?

Steve
  • 1,557
  • 2
  • 17
  • 34

2 Answers2

2

You're not getting results on the second query because you forgot to prefix the parameter with N. I bet you get a result just like with the dynamic SQL if you use:

SELECT TOP 1000 [ID]
    ,[UserName]
    ,[TCID]
FROM [dbo].[Users]
where TCID=8 and username = N'ҭРӱґѻ'; -- note the N prefix here

Now, I'm not saying you should get a result, but that should make the behavior consistent between your two testing methods. What is the collation of the column? You can "fix" this in a way by specifying a binary collation. For example, this should yield proper behavior:

SELECT COUNT(*) 
  FROM [dbo].[Users]
  WHERE [UserName] = N'ҭРӱґѻ' COLLATE Latin1_General_BIN;

-- 0

SELECT COUNT(*) 
  FROM [dbo].[Users]
  WHERE [UserName] = N'ҭРґѻӱ' COLLATE Latin1_General_BIN;

-- 1
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • Interesting. So why would the result not match the input string? The result is shorter and the characters not in the same order. – Steve Sep 10 '11 at 02:12
  • 1
    It has to do with the way the `NVARCHAR` data is stored/encoded. I am the last person who'll be able to describe to you how UCS-2 / UTF-16 work, I can only tell you how to fix it. :-) Oh and I hope I've convinced you that the problem isn't with linq-to-sql. :-) – Aaron Bertrand Sep 10 '11 at 02:14
  • 2
    These questions may or may not yield some more clues: http://stackoverflow.com/questions/5655302/what-are-the-consequences-of-storing-a-c-string-utf-16-in-a-sql-server-nvarcha and http://stackoverflow.com/questions/805259/storing-utf-16-unicode-data-in-sql-server – Aaron Bertrand Sep 10 '11 at 02:23
  • Thanks, I'm very sure it is not linq-to-sql. I am misunderstanding how UTF-16 is handled in sql 2008. I don't know if I need a different collation to get the expected results or what. Your additional link is most helpful. – Steve Sep 10 '11 at 17:36
1

With the collation you are using (probably a SQL Server-specific collation), some Unicode code points are not defined. Thus SQL Server treats them as if they were an empty string:

SELECT CASE WHEN N'ӱ' COLLATE SQL_Latin1_General_CP1_CI_AS = N'' THEN 'YES' ELSE 'NO' END

If we use a newer Windows collation such as Cyrillic_General_100_CI_AS, we see that these strings do not match:

SELECT CASE WHEN N'ӱ' COLLATE Cyrillic_General_100_CI_AS = N'' THEN 'YES' ELSE 'NO' END

Here's a blog post on MSDN that should explain more.

Noel
  • 119
  • 3