For those who have this problem and are looking for a 'fix all' solution... This is how I eventually fixed it:
public static string RemoveTroublesomeCharacters(string inString)
{
if (inString == null)
{
return null;
}
else
{
char ch;
Regex regex = new Regex(@"[^\u0000-\u007F]", RegexOptions.IgnoreCase);
Match charMatch = regex.Match(inString);
for (int i = 0; i < inString.Length; i++)
{
ch = inString[i];
if (char.IsControl(ch))
{
string matchedChar = ch.ToString();
inString = inString.Replace(matchedChar, string.Empty);
}
}
while (charMatch.Success)
{
string matchedChar = charMatch.ToString();
inString = inString.Replace(matchedChar, string.Empty);
charMatch = charMatch.NextMatch();
}
}
return inString;
}
I'll break it down a bit more detail for those less experienced:
We first loop through every character of the entire string and use the IsControl method of char to determine if a character is a control character or not.
If a control character is found, copy that matched character to a string then use the Replace method to change the control character to an empty string. Rinse and repeat for the rest of the string.
Once we have looped through the entire string we then use the regex defined (which will match any character that is not a control character or standard ascii character) and again replace the matched character with an empty string. Doing this in a while loop means that all the time charMatch is true the character will be replaced.
Finally once all characters are removed and we have looped the entire string we return the inString.
(Note: I have still not yet managed to figure out how to repopulate the TextBox with the new modified inString value, so if anyone can point out how it can be done that would be great)