1

I'm using C# to automate an insert into command for a users table, and there's a user whose first name has an accented E, with a grave I believe?

Desirée

Whenever it makes it into the SQL Server table it appears as:

Desir?e

Which data type should I use on this column to ensure that it keeps the accented e?

I've tried varchar and nvarchar, neither seemed to matter.

Code for inserting:

var lines = File.ReadAllLines(users_feed_file);

I believe that there is an encoding issue occurring. When Visual Studio reads my file it reads the name as Desir?e.

So far I've tried to overload the File method, using:

Encoding enc = new UTF8Encoding(true, true);
var lines = File.ReadAllLines(users_feed_file,enc);

But this had no effect.

var lines = File.ReadAllLines(users_feed_file, Encoding.UTF8);

Doesn't work either.

Christopher Bruce
  • 661
  • 3
  • 10
  • 24
  • The datatype of the column is definitely nvarchar. This is encoding issue, how are you inserting into the table? – Kiran Hegde Jun 16 '14 at 16:17
  • I've added my code showing how I'm inserting. It shouldn't be an issue though? I just dropped and re-created the table, the data definitely goes IN with an accent. Just doesn't come out with it. Interesting thing is that the users feed file is also used by the main database(not the temp one I'm working with), with the same column type, and it works just fine. – Christopher Bruce Jun 16 '14 at 16:19
  • I checked the column data types for the tables that get created and they are varchar. So it should be working. The users_feed_file table has the correct data type but it's still putting in a question mark. – Christopher Bruce Jun 16 '14 at 16:27
  • So it seems that Visual Studio is picking it up with the question mark, SQL Server is doing everything it's supposed to but when c# reads the lines it kills the accent. Any ideas? – Christopher Bruce Jun 16 '14 at 16:32
  • Please remove unrelated part of you sample (either reading from file OR writing to SQL since unlikely both are wrong), remove unnecessary string manipulations and SQL injection. Consider having sample with just hardcoded values for SQL call except single parameter in the SQL statement. – Alexei Levenkov Jun 16 '14 at 16:36
  • I'm trying to understand your steps to get the information in the database. Is that right : text file => C# reader program => SQL Insert queries => Database? If so, are the accented characters OK in that file? – ForguesR Jun 16 '14 at 18:57

2 Answers2

0

Sql Server stores unicode text essentially as Unicode-2 or UTF-16. That is, it uses fixed, two-bytes for all characters. UTF-8 uses variable three-bytes for all characters, using one, two, or three bytes as needed. If the character in questions (it would be good to post the actual unicode value) is translated by UTF-8 into three bytes, then Sql Server reads that back as two two-byte characters, one of which probably is not a valid, displayable character, thus rendering a question mark. Note that Sql Server is not storing a question mark, that is just how whatever text editor you are using renders this garbled character.

Try changing your C# encoding to Encoding.Unicode and see if that helps round-trip the character in question.

The same reasoning applies to characters that ought to fit into one-byte, but are represented with two by UTF-8. So for example, the unicode hex value for small e with grave is xE8, which could be represented as 00 E8 in two bytes. But UTF-8 renders it as C3 E8. Now, look for that value in Unicode (UTF-16) - there is no such character. So in this case it is not two bytes represented as three, but one byte represented incorrectly as two. This resource is invaluable when trying to debug extended character issues.

Note that for the basic Latin ascii set, UTF-8 uses the same values as Unicode, and thus those characters round-trip just fine. It's when using extended character sets that compatibility for both encodings cannot be guaranteed.

mdisibio
  • 3,148
  • 31
  • 47
0

Hi try with this code:

var lines = File.ReadAllLines(users_feed_file, Encoding.Unicode);

but in notepade++ you can view the file encoding, check this.

Chiro300
  • 76
  • 1
  • 7