I'm using SQL Server 2014 Enterprise and Visual Studio 2103.
I have hundreds of TSV files that contain foreign characters that I'm importing into SQL Server. I have a SSIS package that automates this (just a script task that I found online that uses C#). Tables are created with NVARCHAR(MAX)
datatype for all columns, then each file is read line by line by the script, with the values inserted into the tables.
The source TSV files are exported as Unicode, but SQL Server doesn't seem to care - it imports the files as VARCHAR
(i.e., Chinese characters come over as "?????"). If you manually import the file into SQL Server, the code page shows "65001 (UTF-8)" so I'm not sure why the datatypes default to VARCHAR
.
Now, I suppose I can configure a DATA CONVERSION TRANSFORM for each of the files, but there are too many files and I'm thinking this can be done on the fly within the script task insert:
SCRIPT TASK:
Some variables for encoding:
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
Encoding utf8 = Encoding.UTF8;
Encoding utf32 = Encoding.UTF32;
The following part of the script task code is where I try to convert the encoding (the first part of the IF statement (not shown) creates the receiving table). It errors out where indicated:
else
{
//ADJUST FOR SINGLE QUOTES:
line = line.Replace("'", "''");
byte[] unicodeBYTES = unicode.GetBytes(line);
byte[] unicodeCONVERT = Encoding.Convert(unicode, utf8, unicodeBYTES); <--- ERRORS OUT
char[] unicodeCHARS = new char[unicode.GetCharCount(unicodeCONVERT, 0, unicodeCONVERT.Length)];
unicode.GetChars(unicodeCONVERT, 0, unicodeCONVERT.Length, unicodeCHARS, 0);
string NEWline = new string(unicodeCHARS);
string query = "Insert into " + SchemaName + ".[" + TableName + "] (" + ColumnList + ") ";
query += "VALUES('" + NEWline + "')";
// MessageBox.Show(query.ToString());
SqlCommand myCommand1 = new SqlCommand(query, myADONETConnection);
myCommand1.ExecuteNonQuery();
}
However, If I change the line:
byte[] unicodeCONVERT = Encoding.Convert(unicode, utf8, unicodeBYTES);
to the following:
byte[] unicodeCONVERT = Encoding.Convert(unicode, unicode, unicodeBYTES);
It loads the data, but is still in ASCII format (with "?????" characters).
Any help would be appreciated.
Thank you.