1

I have a client who has asked us to write a c# application that takes data from their database and output it to a .csv file. So far so good.

This DB contains some unicode characters and when the client opens up the .csv with Excel those characters look "weird". Ex: x0096 looks like an A with a carrot on top next to the Euro currency sign, when the client thinks it should look like a Dash.

So I have been asked to make those characters look "not wierd".

I have written code for each weird character (I have like 12 of the below lines).

input = input.Replace((char)weirdCharacter, (char)normalCharacter);

There has got to be a better way.

Eric Petroelje
  • 59,820
  • 9
  • 127
  • 177
Erik Volkening
  • 67
  • 1
  • 1
  • 9
  • 1
    So what encoding are the database and CSV file using? – Tyler Crompton Jan 23 '13 at 14:42
  • The first thought is to make an array of weird and normal characters, and loop through it (rather than one line per...). But it's still a bit kluge-ey. – Floris Jan 23 '13 at 14:43
  • 3
    [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html) – mellamokb Jan 23 '13 at 14:45
  • 1
    Can your answer be found here: http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net – Floris Jan 23 '13 at 14:47
  • Yeah. It was an encoding issue. I used a solution similar to what VJ mentioned below. The link that Floris mentioned had some code that was useful too. Thanks to all! – Erik Volkening Jan 23 '13 at 15:46

2 Answers2

1

I had the same problem when I was generating HTML files. The solution for me was to change the encoding of my output file.

StreamWriter swHTMLPage = 
                new System.IO.StreamWriter(OutputFileName, false, Encoding.UTF8);

Once I added the Encoding.UTF8 parameter the characters started displaying correctly. I don't know if this can be applied to your solution though since Excel is involved, but I am betting it can be.

Vincent James
  • 1,120
  • 3
  • 16
  • 27
0

As Vincent James says, if this is an encoding issue, then the ideal way to fix this is to just use the right encoding when you decode/encode the value, but if that still doesn't work...

I think this is pretty straightforward. What do you think?:

Dictionary<char, char> substitutions = new Dictionary<char, char> {
  {'\0x0096', 'F'}, {'\0x0101', 'O'}, {'\0x0121', 'O'}, ...
};

foreach(KeyValuePair<char, char> pair in substitutions)
{
   input.Replace(pair.Key, pair.Value);
}
JLRishe
  • 99,490
  • 19
  • 131
  • 169