0

I have a Stream in which there is uploaded text file. The file can have any encoding - ANSI, UTF8 without BOM, UTF8 with BOM. The file has characters which are specific for some languages - e.g. ąę. I need to save all files as UTF8 BOM files on the server side, but I can't force all users to upload me UTF8 BOM files. Special characters in saved file must maintain correct - so ąę. How can I do this?

I have:

        using (Stream inputStream = file.InputStream)
        {
            byte[] bytes = ReadFully2(inputStream);
            string utf8string = .....what here?.....
            System.IO.File.WriteAllText("", utf8string, System.Text.Encoding.UTF8);
        }


public static byte[] ReadFully2(Stream input)
{
    input.Position = 0;
    using (MemoryStream ms = new MemoryStream())
    {
        input.CopyTo(ms);
        return ms.ToArray();
    }
}
Barney
  • 154
  • 1
  • 3
  • 12
  • 2
    Well, guessing the codePage of a file without other informations can bequite tricky even for an expert. Programming it in automatic is, IMO, impossible in a way that is acceptably robust. – Alberto Chiesa Feb 23 '17 at 17:50
  • 1
    The problem is reading the text using the correct encoding. If by ANSI you mean the well-defined ANSI codepage, you could read everything with UTF8Encoding - UTF8 and ANSI are the same for the ANSI character range. If by ANSI you mean "any local codepage" you have a problem. – Panagiotis Kanavos Feb 23 '17 at 17:52
  • You can force the users to specify a codepage, or you can try to read the file using different codepages and check for the Unicode Replacement character (�) in invalid encodings. Unfortunately, *some* encodings will result in garbled text instead of �. Forcing the users to specify the encoding is safer – Panagiotis Kanavos Feb 23 '17 at 17:54

0 Answers0