0

I'm using SQL Server 2014 Enterprise and Visual Studio 2103.

I have hundreds of TSV files that contain foreign characters that I'm importing into SQL Server. I have a SSIS package that automates this (just a script task that I found online that uses C#). Tables are created with NVARCHAR(MAX) datatype for all columns, then each file is read line by line by the script, with the values inserted into the tables.

The source TSV files are exported as Unicode, but SQL Server doesn't seem to care - it imports the files as VARCHAR (i.e., Chinese characters come over as "?????"). If you manually import the file into SQL Server, the code page shows "65001 (UTF-8)" so I'm not sure why the datatypes default to VARCHAR.

Now, I suppose I can configure a DATA CONVERSION TRANSFORM for each of the files, but there are too many files and I'm thinking this can be done on the fly within the script task insert:

SCRIPT TASK:

Some variables for encoding:

  Encoding ascii = Encoding.ASCII;
  Encoding unicode = Encoding.Unicode;
  Encoding utf8 = Encoding.UTF8;
  Encoding utf32 = Encoding.UTF32;

The following part of the script task code is where I try to convert the encoding (the first part of the IF statement (not shown) creates the receiving table). It errors out where indicated:

else
{
  //ADJUST FOR SINGLE QUOTES:
  line = line.Replace("'", "''");
  byte[] unicodeBYTES = unicode.GetBytes(line);
  byte[] unicodeCONVERT = Encoding.Convert(unicode, utf8, unicodeBYTES);  <--- ERRORS OUT
  char[] unicodeCHARS = new char[unicode.GetCharCount(unicodeCONVERT, 0, unicodeCONVERT.Length)];
  unicode.GetChars(unicodeCONVERT, 0, unicodeCONVERT.Length, unicodeCHARS, 0);
  string NEWline = new string(unicodeCHARS);

  string query = "Insert into " + SchemaName + ".[" + TableName + "] (" + ColumnList + ") ";
  query += "VALUES('" + NEWline + "')";

  //  MessageBox.Show(query.ToString());

  SqlCommand myCommand1 = new SqlCommand(query, myADONETConnection);
  myCommand1.ExecuteNonQuery();
}

However, If I change the line:

byte[] unicodeCONVERT = Encoding.Convert(unicode, utf8, unicodeBYTES);

to the following:

byte[] unicodeCONVERT = Encoding.Convert(unicode, unicode, unicodeBYTES);

It loads the data, but is still in ASCII format (with "?????" characters).

Any help would be appreciated.

Thank you.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Craig
  • 145
  • 1
  • 12
  • Changing `VALUES('"` to `VALUES(N'"` is the tactical fix (see https://stackoverflow.com/questions/10025032/what-is-the-meaning-of-the-prefix-n-in-t-sql-statements). But the proper fix is not to do `query += "VALUES('" + NEWline + "')";` and instead use SQL parameters. – mjwills May 23 '18 at 01:09
  • Possible duplicate of [What are good ways to prevent SQL injection?](https://stackoverflow.com/questions/14376473/what-are-good-ways-to-prevent-sql-injection) – mjwills May 23 '18 at 01:10
  • 1
    Strings in .NET are inherently Unicode. All you need is a parameterized query and ditch the conversion ugliness. Strongly-typed parameters also avoid the need to escape quotes, format dates, and specify proper decimal separators that may vary by culture. – Dan Guzman May 23 '18 at 01:33
  • @DanGuzman: Thank you. Any links that can assist? I would like to see a similar situation to mine - unless I'm reading the above links incorrectly. – Craig May 23 '18 at 01:42
  • There are [examples in the documentation](https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.parameters.aspx). – Dan Guzman May 23 '18 at 01:58
  • I would strongly suggest reading up on SQL injection. Otherwise a TSV file could end up deleting data. :) – mjwills May 23 '18 at 05:28

0 Answers0