I'm working on a c# project in which some data contains characters which are not recognised by the encoding. They are displayed like that:
"Some text � with special � symbols in it".
I have no control over the encoding process, also data come from files of various origins and various formats. I want to be able to flag data that contains such characters as erroneous or incomplete. Right now I am able to detect them this way:
if(myString.Contains("�"))
{
//Do stuff
}
While it does work, it doesn't feel quite right to use the weird symbol directly in the Contains function. Isn't there a cleaner way to do this ?
EDIT:
After checking back with the team responsible for reading the files, this is how they do it:
var sr = new StreamReader(filePath, true);
var content = sr.ReadToEnd();
Passing true
as a second parameter of StreamReader is supposed to detect the encoding from the file's BOM, and use it to read the content. It doesn't always work though, as some files don't bear that information, hence why their data is read incorrectly.
We've made some tests and using StreamReader(filePath, Encoding.Default)
instead appears to work for most if not all files we had issues with. Expectedly, files that were working before not longer work because they do not use the default encoding.
So the best solution for us would be to do the following: read the file trying to detect its encoding, then if it wasn't successful read it again with the default encoding.
The problem remains the same though: how do we check, after trying to detect the file's encoding, if data has been read incorrectly ?