Possible Duplicate:
How can I detect the encoding/codepage of a text file
I have plenty of txt files in directory. I have to find all ones with UTF-8 Encoding. How to achieve that?
Possible Duplicate:
How can I detect the encoding/codepage of a text file
I have plenty of txt files in directory. I have to find all ones with UTF-8 Encoding. How to achieve that?
You cannot detect an arbitrary text encoding in full generality, since you can never know what a random bunch of bytes was intended to mean. The only meaningful question you can ask is "can I interpret this data correctly as UTF-8".
The easiest way to answer that is to run any of your favourite encoding converters on the file and check for errors (e.g. iconv()
or something from ICU, or whatever C# provides). If you want to be manual, you would have to go through the file byte-by-byte and check if everything forms a correct UTF-8 code sequence. The validation is pretty much the same amount of work as flat-out conversion (to UTF-32), since for proper validation you'll not only have to check that all bytes make up complete code sequences, but also that the encoded value is itself a valid Unicode codepoint.
It's a fun little exercise to write this yourself, but the quickest solution would be to just use a library function.