Convert unicode characters with c#

Question

I have a client who has asked us to write a c# application that takes data from their database and output it to a .csv file. So far so good.

This DB contains some unicode characters and when the client opens up the .csv with Excel those characters look "weird". Ex: x0096 looks like an A with a carrot on top next to the Euro currency sign, when the client thinks it should look like a Dash.

So I have been asked to make those characters look "not wierd".

I have written code for each weird character (I have like 12 of the below lines).

input = input.Replace((char)weirdCharacter, (char)normalCharacter);

There has got to be a better way.

The first thought is to make an array of weird and normal characters, and loop through it (rather than one line per...). But it's still a bit kluge-ey. — Floris, Jan 23 '13 at 14:43
[The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html) — mellamokb, Jan 23 '13 at 14:45
Can your answer be found here: http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net — Floris, Jan 23 '13 at 14:47
Yeah. It was an encoding issue. I used a solution similar to what VJ mentioned below. The link that Floris mentioned had some code that was useful too. Thanks to all! — Erik Volkening, Jan 23 '13 at 15:46

score 1 · Accepted Answer · answered Jan 23 '13 at 14:45

1

I had the same problem when I was generating HTML files. The solution for me was to change the encoding of my output file.

StreamWriter swHTMLPage = 
                new System.IO.StreamWriter(OutputFileName, false, Encoding.UTF8);

Once I added the Encoding.UTF8 parameter the characters started displaying correctly. I don't know if this can be applied to your solution though since Excel is involved, but I am betting it can be.

answered Jan 23 '13 at 14:45

Vincent James

1,120
3
16
27

Assuming that the database uses UTF-8 throughout, this should do the trick. – Tyler Crompton Jan 23 '13 at 14:46

JLRishe · Answer 2 · 2013-01-23T15:00:32.697

As Vincent James says, if this is an encoding issue, then the ideal way to fix this is to just use the right encoding when you decode/encode the value, but if that still doesn't work...

I think this is pretty straightforward. What do you think?:

Dictionary<char, char> substitutions = new Dictionary<char, char> {
  {'\0x0096', 'F'}, {'\0x0101', 'O'}, {'\0x0121', 'O'}, ...
};

foreach(KeyValuePair<char, char> pair in substitutions)
{
   input.Replace(pair.Key, pair.Value);
}

Convert unicode characters with c#

2 Answers2