I have a C# program that is reading data from a file, then inserting unique strings into a SQL table (via System.Data.SqlClient). The problem is that SQL considers certain strings to be equivalent when C# doesn't.
Here's an example. Suppose my C# program generated the first two lines of the SQL script below. Clearly, C# is going to consider a "2" and a "superscript-2" as different, but, as you can see, SQL doesn't:
DECLARE @S1 NVARCHAR(10) = '2';
DECLARE @S2 NVARCHAR(10) = '²';
DROP TABLE IF EXISTS #Test;
CREATE TABLE #Test(S NVARCHAR(10) NOT NULL);
CREATE UNIQUE INDEX I1 ON #Test(S);
INSERT #Test SELECT @S1;
INSERT #Test SELECT @S2;
Msg 2601, Level 14, State 1, Line 7 Cannot insert duplicate key row in object 'dbo.#Test' with unique index 'I1'. The duplicate key value is (²). The statement has been terminated.
I can work around the problem, sort of, by using the C# String.Normalize function, but this actually changes the strings, which I'm not keen on. What I would ideally be able to do is to compare two string in C# the same way that SQL does.
Any solutions to this?
I should add: The whole purpose of this exercise is performance. On program startup, the C# program reads current list of the string table into memory, then reads the data file, adding strings to the memory table as needed. After all data is read, the inserts are executed from C# against SQL.
We have the same problem in SSIS, by the way, when doing lookups of codes read from text files and preparing inserts for SQL.