In C# and SQL Server, how do I work with and display diacritical marks?

Question

I am building (I thought) a simple flash card program to help me learn Spanish. That is, English to Spanish, or flip to Spanish to English. I am using C# and SQL Server. The code is working fine except that I'm struggling to figure out how to work with the Spanish diacritical marks like "á, é, í, ó, ú, ñ". Specifically, while I can enter them into my prep Excel spreadsheet using the win10 special keyboard, and they display correctly, when I load them into my SQL Sever database, and access via VS C# Winforms program, they display incorrectly.

I have tried using nvarchar instead of varchar, and that just loads and displays these characters in a stranger format.

I have tried switching my default keyboard to Spanish but that totally switched me to Spanish. Also, note that when I work with them with Note++ they show correctly.

When I process them in c# they input correctly, but when processed with StreamWriter they don't. Before I load the Excel .csv data into a database, I process each to add some additional columns.

using (StreamReader reader = new StreamReader(File.OpenRead(INFILE)))
{
    using (StreamWriter writer = new StreamWriter(File.Open(OUTFILE, FileMode.Create)))
    {
        while ((sLineIn = reader.ReadLine()) != null)   // Process each input record until no more
        {
            iRecIn++;                       // Count input records

            if (bFirstRecord)
            {
                iRecOut++;                  // Account for title line
                bFirstRecord = false;       // Skip first record
                continue;
            }

            sLineOut = processLineOut(sLineIn);
            writeTheRecord(writer);
            continue;
        }
    }
}

I just plain don't know where to go next. Thanks!!

Part-2: My process is as follows:

Flash card data is entered into a general Excel 2016 file. One card per line as you can see in pic.
1. I save this file as a csv file.
2. Since my database table has an additional 10+ columns, I have a separate c# program that uses the just created csv file, and adds the remaining columns via the StreamReader/StreamWriter process. See the code snippet. The output file is also a csv file.
3. I run my sql server job that uses bulk insert to load the output from step 3 to the database.
4. I use my c# winform flashcard program to flash the card deck.

Regarding the database table, column is defined as: SpanWord varchar(100),

I switched it back from : SpanWord nvarchar(100),

since that wasn't working either.... Thanks

Part-3: I agree this is too vague and not concise. I will break it up into parts and work on each of them. Then move through the process.

I don't believe I can close this question so if someone with proper authority, please close this for me... Thanks again! I have enough to work on.

How is your column defined? As in the column that holds this data — Fabulous, Aug 20 '18 at 17:57
What has StreamReader\Writer have to do with SQL server or Excel? — Cetin Basoz, Aug 20 '18 at 18:44
OK you are not going to answer I think, check Northwind sample database. It is already full of those characters. MS SQL server supports unicode with NCHAR and NVARCHAR. — Cetin Basoz, Aug 20 '18 at 18:49

score 2 · Answer 1 · answered Aug 20 '18 at 23:05

There are too many possible issues in this question. You should focus on each step in the journey of the data and ensure that the data is still 'good'. Once you find a specific problem, you could then research that specific problem, and if you fail to find an answer, then you should create an MCVE of that problem, and then you could ask a question.

Here are some high level pointers: You should ensure that the program which is reading the CSV file is reading the characters correctly by debugging it. If it isn't, the most likely cause is the encoding: change the reader to decide the encoding based on the byte order mark like this:

new StreamReader(INFILE, Encoding.Default, true)

If this doesn't work, you can tell it specifically which encoding to use. (You can see the encoding of the file by opening it in Notepad and doing 'Save as...'.)

Then you also need to be sure you're writing the file with an appropriate encoding - I'd suggest using whatever encoding the input file is.

I suspect you'll need to change the database field to nvarchar, unless all the characters you require are covered by varchar. Once that is right, and the file has the correct encoding, the bulk insert of that file should work.

You haven't shown the code where you read the data from the database into the WinForms application; so I can only guess there is no problem there.

thanks for the advice. I will do as you suggested and start back with the input file. — John Slate, Aug 21 '18 at 00:18

score 0 · Answer 2 · answered Aug 21 '18 at 22:18

Thanks to Richardissimo! I broke the question into more manageable parts:

The c# console app that read in the csv file, added some more columns and then wrote out another csv file, required the encoding as Richard said.

    using (StreamReader reader = new StreamReader(File.OpenRead(INFILE),
        Encoding.Default, true))
    {
        using (StreamWriter writer = new StreamWriter(File.Open(OUTFILE, FileMode.Create),
            Encoding.Default))

This worked and I verified the output file contained the Spanish diacritical marks.

I still had an issue with the sql server 2016 database. I tried several variations of the COLLATE parameters but that didn't work on the create table. I found out that I needed to specify CODEPAGE = '1252' as a parameter on the BULK INSERT statement. Then once loaded, the SELECTs displayed the diacritical characters correctly, including my c# WinForms app.

NOTE: I did not need to change Spanish word column (SpanWord) to nvarchar. I left it as varchar since all the Spanish diacticals are covered in varchar. They are just greater than 127.

Per MS:

CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' }
Specifies the code page of the data in the data file. CODEPAGE is relevant
only if the data contains char, varchar, or text columns with character values
greater than 127 or less than 32.

Thanks to all!

In C# and SQL Server, how do I work with and display diacritical marks?

2 Answers2