1

I have a C# program that is reading data from a file, then inserting unique strings into a SQL table (via System.Data.SqlClient). The problem is that SQL considers certain strings to be equivalent when C# doesn't.

Here's an example. Suppose my C# program generated the first two lines of the SQL script below. Clearly, C# is going to consider a "2" and a "superscript-2" as different, but, as you can see, SQL doesn't:

DECLARE @S1 NVARCHAR(10) = '2';
DECLARE @S2 NVARCHAR(10) = '²';
DROP TABLE IF EXISTS #Test;
CREATE TABLE #Test(S NVARCHAR(10) NOT NULL);
CREATE UNIQUE INDEX I1 ON #Test(S);
INSERT #Test SELECT @S1;
INSERT #Test SELECT @S2;

Msg 2601, Level 14, State 1, Line 7 Cannot insert duplicate key row in object 'dbo.#Test' with unique index 'I1'. The duplicate key value is (²). The statement has been terminated.

I can work around the problem, sort of, by using the C# String.Normalize function, but this actually changes the strings, which I'm not keen on. What I would ideally be able to do is to compare two string in C# the same way that SQL does.

Any solutions to this?

I should add: The whole purpose of this exercise is performance. On program startup, the C# program reads current list of the string table into memory, then reads the data file, adding strings to the memory table as needed. After all data is read, the inserts are executed from C# against SQL.

We have the same problem in SSIS, by the way, when doing lookups of codes read from text files and preparing inserts for SQL.

Hadi
  • 36,233
  • 13
  • 65
  • 124
BoCoKeith
  • 817
  • 10
  • 21
  • 2
    I think you need to set the columns collation when creating the temp table. https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-2017 –  Jun 03 '19 at 16:36
  • 1
    You can work the other way around: make SQL Server compare strings using a binary collation (`Latin1_General_BIN2`) for uniqueness and offer up a "regular" column with a non-painful interface as a computed one. The other way around is basically impossible (or at least really, *really* impractical): it would require a grounds-up reimplementation of the SQL Server collation code in managed code. – Jeroen Mostert Jun 03 '19 at 16:37
  • I guess you could just do a sql merge instead of a sql insert and let the server decide whether to insert the record or not. I'm assuming that duplicates like your example above would amount to, like, 1% of cases. – Kris Jun 03 '19 at 16:37
  • SQL Server is case insensitive by default, but that can be changed if you need it to be sensitive. Otherwise, I usually use `string.Compare(stringA, stringB, true) === 0`. The last param is `ignoreCase`. If you're using a dictionary to store things, those can also be made case insensitive: https://stackoverflow.com/questions/13230414/case-insensitive-access-for-generic-dictionary – ps2goat Jun 03 '19 at 16:37
  • The other basic approach is to ask SQL Server to eliminate/consolidate duplicates with a separate query (you can use a table-valued parameter for this), and use the results back in C#, essentially farming the task of duplicate checking off to the implementing code. In SSIS, you can do this directly by eliminating its built-in memory cache, which will cause it to issue separate queries per value (obviously, this can have a detrimental effect on performance, though). – Jeroen Mostert Jun 03 '19 at 16:38
  • Besides the excellent comments referencing collation, the SSIS tool for handling this is the Lookup Component – billinkc Jun 03 '19 at 18:18

1 Answers1

1

I just discovered System.Data.SqlTypes.SqlString.Equals. For instance,

var b = SqlString.Equals("2", "²");

is true. This doesn't fix the SSIS part of the problem, but it is very promising for C#.

BoCoKeith
  • 817
  • 10
  • 21
  • 1
    Be careful with this. What you're getting here is a comparison based on `CultureInfo.CurrentCulture`. This need not have any relationship to the collation of the columns you're comparing. `SqlString` ultimately delegates to `CultureInfo` implementations with some sauce on the side to support some SQL collation options like accent-sensitivity -- it is almost, but not *completely* the same thing as what SQL Server itself uses even for matching cultures, especially if any legacy collations are used. – Jeroen Mostert Jun 03 '19 at 16:43