I am attempting to replace some undesirable characters in a string retrieved from an Excel spreadsheet. The reason being that our Oracle database is using the WE8ISO8859P1 character set, which does not define several characters that Excel "helpfully" inserts for you in text (curly quotes, em and en dashes, etc.) Since I have no control over the database or how the Excel spreadsheets are created I need to replace the characters with something else.
I retrieve the cell contents into a string thus:
string s = xlRange.get_Range("A1", Missing.Value).Value2.ToString().Trim();
Viewing the string in Visual Studio's Text Visualiser shows the text to be complete and correctly retrieved. Next I try and replace one of the undesirable characters (in this case the right-hand curly quote symbol):
s = Regex.Replace(s, "\u0094", "\u0022");
But it does nothing (Text Visualiser shows it still to be there). To try and verify that the character I want to replace is actually in there, I tried:
bool a = s.Contains("\u0094");
but it returns false. However:
bool b = s.Contains("”");
returns true.
My (somewhat lacking) understanding of strings in .NET is that they're encoded in UTF-16, whereas Excel would probably be using ANSI. So does that mean I need to change the encoding of the text as it comes out of Excel? Or am I doing something else wrong here? Any advice would be greatly appreciated. I have read and re-read all articles I can find about Unicode and encoding but am still none the wiser.