6

I would like to read byte[] using C# with the current encoding of the file.

As written in MSDN the default encoding will be UTF-8 when the constructor has no encoding:

var reader = new StreamReader(new MemoryStream(data)).

I have also tried this, but still get the file as UTF-8:

var reader = new StreamReader(new MemoryStream(data),true)

I need to read the byte[] with the current encoding.

timss
  • 9,982
  • 4
  • 34
  • 56
Ori
  • 115
  • 1
  • 2
  • 14
  • 5
    Your question makes no sense - a byte array doesn't *have* an encoding. It's just binary data. If your file has binary data, you shouldn't use `StreamReader` at all... you should just use a `Stream`. – Jon Skeet May 16 '13 at 21:48
  • 1
    If we are talking about the encoding of the file, the question might be a duplicate of http://stackoverflow.com/questions/4520184/how-to-detect-the-character-encoding-of-a-text-file – Tz_ May 16 '13 at 21:51
  • @JonSkeet Would you think to write an article about *binary data,strings, encodings etc.* (If you haven't yet) Since i see a lot of question similar to this (http://stackoverflow.com/questions/16597920/how-to-convert-binary-string-to-bytes-array#16597920) and it is very hard for us (non-english speakers) to explain. – I4V May 16 '13 at 21:56
  • @I4V: Marc Gravell wrote a good one a while ago: http://marcgravell.blogspot.co.uk/2013/02/how-many-ways-can-you-mess-up-io.html – Jon Skeet May 16 '13 at 22:02
  • i have found this post: http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/0da1e58b-0531-44ba-b2af-0ebfd56566b7 – Ori May 16 '13 at 22:08
  • some extra information: we are getting a text file via a web server call. saving this data as blob in oracle database. then we read it from a windows service using streamreader and doing some data control. i need to find out what is the encoding of the data. thank you for the answers – Ori May 16 '13 at 22:15
  • `we are getting a text file via a web server call`, This probably means you know the correct encoding of the data. Just convert the data to string using that encoding and then store it in DB as string. – I4V May 16 '13 at 22:20

2 Answers2

14

A file has no encoding. A byte array has no encoding. A byte has no encoding. Encoding is something that transforms bytes to text and vice versa.

What you see in text editors and the like is actually program magic: The editor tries out different encodings an then guesses which one makes the most sense. This is also what you enable with the boolean parameter. If this does not produce what you want, then this magic fails.

var reader = new StreamReader(new MemoryStream(data), Encoding.Default);

will use the OS/Location specific default encoding. If that is still not what you want, then you need to be completely explicit, and tell the streamreader what exact encoding to use, for example (just as an example, you said you did not want UTF8):

var reader = new StreamReader(new MemoryStream(data), Encoding.UTF8);
Jan Dörrenhaus
  • 6,581
  • 2
  • 34
  • 45
  • 1
    I have checked again what exactly we get in the web-service call and it is byte[]. as i understand from the answers i cannot know the encoding of the data. then i will need to check if the file contains bom or diacritic in order to select the correct encoding.(utf-8,utf-8 with bom or 1252). thank you for all the answers. – Ori May 16 '13 at 22:49
0

I just tried leveraging different way of trying to figure out the ByteEncoding and it is not possible to do so as the byte array does not have an encoding in place as Jan mentions in his reply. However you can always take the value and do the type conversion to UTF8 or ASCII/Unicode and test the string values in case you are doing a "Text.EncodingFormat.GetString(byte [] array)"

public static bool IsUnicode(string input)    
{    
    var asciiBytesCount = Encoding.ASCII.GetByteCount(input);
    var unicodBytesCount = Encoding.UTF8.GetByteCount(input);
    return asciiBytesCount != unicodBytesCount;
}
HaveNoDisplayName
  • 8,291
  • 106
  • 37
  • 47