I'm parsing a number of text files that contain 99.9% ascii characters. Numbers, basic punctuation and letters A-Z (upper and lower case).
The files also contain names, which occasionally contain characters which are part of the extended ascii character set, for example umlauts Ü and cedillas ç.
I want to only work with standard ascii, so I handle these extended characters by processing any names through a series of simple replace() commands...
myString = myString.Replace("ç", "c");
myString = myString.Replace("Ü", "U");
This works with all the strange characters I want to replace except for Ø (capital O with a forward slash through it). I think this has the decimal equivalent of 157.
If I process the string character-by-character using ToInt32() on each character it claims the decimal equivalent is 65533 - well outside the normal range of extended ascii codes.
Questions
- why doesn't
myString.Replace("Ø", "O");
work on this character? - How can I replace "Ø" with "O"?
Other information - may be pertinent. Opening the file with Notepad shows the character as a "Ø". Comparison with other sources indicate that the data is correct (i.e. the full string is "Jørgensen" - a valid Danish name). Viewing the character in visual studio shows it as "�". I'm getting exactly the same problem (with this one character) in hundreds of different files. I can happily replace all the other extended characters I encounter without problems. I'm using System.IO.File.ReadAllLines()
to read all the lines into an array of strings for processing.