0

I have several input text files, each of them has different encoding used - UTF8, UCS2 LE BOM etc.

The content of those files is read using byte [] fileBytes = File.ReadAllBytes(filepath); method and stored as byte array.

I need to restore the original text content of these files, but I don't know how to determine the source encoding for each of those?

pitersmx
  • 935
  • 8
  • 27
  • What I normally do is open file with Notepad. Then do a SaveAs. The encoding will be displayed on the bottom of the SAveAs window. – jdweng May 24 '18 at 21:54
  • @jdweng Yea I know that, notepad++ shows me current encoding. But I need to get that information in runtime, when file is being read. – pitersmx May 24 '18 at 21:55
  • If you the files have BOMs then you can check those. If not then text encoding cannot be reliably detected. – David Heffernan May 25 '18 at 05:49
  • It depends on what you are doing with the data if the encoding is required. If you are just passing data from input to output then safest thing to do is to use UTF8 which doesn't change the data. Once you choose another encoding method data gets changed and you will not be able to recover. – jdweng May 25 '18 at 06:27
  • This helped me https://stackoverflow.com/questions/3825390/effective-way-to-find-any-files-encoding - accepted answer. I made it that way: when I read file bytes, I use this method to get encoding of the file, and then I am setting my own encoding (converting byte array to UTF8) on byte array so that I know what it is, and after that byte array is stored in db. When recreating the file, i just create a string with UTF8 encoding and it works. – pitersmx May 25 '18 at 07:19

0 Answers0