This question may reveal my ignorance regarding character encoding, so if it does, I would greatly appreciate information to correct that.
I am relaying strings from new applications to an old application. The old application only accepts ASCII characters (http://www.asciitable.com/). The old application also does not support certain characters such as backslashes. The new applications support more or less anything.
Let's say I have the string:
"Whatever - 1_夜_"
I need to convert that to something with only ASCII characters. For example, maybe something like:
"Whatever - 1_\u001cY_=???=???=???"
Then I want to replace the remaining illegal characters with substitution strings.
Ideally, any character that is encoded to ASCII should be able to be de-coded. That is, any unique input string will have a unique output string (no arbitrary inputs "abc" and "xyz" which are different produce the same result). An algorithm could convert the output string back to the input string.
This is what I've tried:
static string ConvertToAscii(string str)
{
var return_string = "";
foreach (var c in str)
{
if ((int)c < 128)
{
return_string += c;
}
else
{
var charBytes = BitConverter.GetBytes(c);
var ascii = Encoding.ASCII.GetString(charBytes);
return_string += ascii;
}
}
return return_string;
}
When I use this with the string I mentioned above, I get:
"Whatever - 1_\u001cY_=???=???=???"
That seems great - however, the "\u001cY" is apparently a single character, rather than a collection of ASCII characters. So my target database rejects it, and I am not able to figure out how to remove the "\" while leaving the remaining characters.
How can I convert any string into a collection of ASCII characters?