Whats going on with this byte array?

Question

I have a byte array: 00 01 00 00 00 12 81 00 00 01 00 C8 00 00 00 00 00 08 5C 9F 4F A5 09 45 D4 CE

It is read via StreamReader using UTF8 encoding

// Note I can't change this code, to many component dependent on it.
using (StreamReader streamReader = 
    new StreamReader(responseStream, Encoding.UTF8, false))
{
    string streamData = streamReader.ReadToEnd();
    if (requestData.Callback != null)
    {
        requestData.Callback(response, streamData);
    }
}

When that function runs I get the following returned to me (i converted to a byte array)

00 01 00 00 00 12 EF BF BD 00 00 01 00 EF BF BD 00 00 00 00 00 08 5C EF BF BD 4F EF BF BD 09 45 EF BF BD

Somehow I need to take whats returned to me and get it back to the right encoding and the right byte array, but I've tried alot.

Please be aware, I'm working with WP7 limited API.

Hopefully you guys can help.

Thanks!

Update for help...

if I do the following code, it's almost right, only thing that is wrong is the 5th to last byte gets split out.

byte[] writeBuf1 = System.Text.Encoding.UTF8.GetBytes(data);
                    string buf1string = System.Text.Encoding.BigEndianUnicode.GetString(writeBuf1, 0, writeBuf1.Length);
                    byte[] writeBuf = System.Text.Encoding.BigEndianUnicode.GetBytes(buf1string);

Can you show us the code that is writing/creating the array? — Emond, Jul 01 '11 at 04:30
Nope, it's comming from a third party service, that's the exact data that the service returns... Besides, I just want to get it back to what it's supposed to be (as it stands in the response stream) — John, Jul 01 '11 at 04:52
Then how do you know in what encoding and byte-order the stream is written to? — Emond, Jul 01 '11 at 05:29
Can you attach a network sniffer (Fiddler) to see what is actually being transmitted? — Emond, Jul 01 '11 at 05:31
Please note the array changed, but here's a screenshot of the fiddler hex http://imageshack.us/photo/my-images/818/returnz.png/ — John, Jul 01 '11 at 05:39
http://stackoverflow.com/questions/25222973/weird-characters-in-url — trante, Aug 13 '14 at 22:24

score 41 · Accepted Answer · edited May 28 '21 at 14:59

41

The original byte array is not encoded as UTF-8. The StreamReader therefore replaces each invalid byte with the replacement character U+FFFD. When that character gets encoded back to UTF-8, this results in the byte sequence EF BF BD. You cannot construct the original byte value from the string because the information is completely lost.

edited May 28 '21 at 14:59

StackzOfZtuff

2,534
1
28
25

answered Jul 01 '11 at 06:05

Roland Illig

40,703
10
88
121

That's what I was afraid of... So the only way to really not lose the data is figure out what the encoding is and read like that? Unfortunatly, for some reason I can't just read a byte array, the Stream requires a streamreader to read... – John Jul 01 '11 at 06:18
1

Yes, and when you are in doubt, use `ISO-8859-1`, so you will get a simple 1:1 mapping from bytes to characters. Just for curiosity: Why would anyone want to read a byte stream like this (which is obviously non-character data) as a character stream? – Roland Illig Jul 01 '11 at 06:24
Can't you ask the source of the stream for a specification? – Emond Jul 01 '11 at 06:52
Everything is (and has been) character data except for this one new part. Eitherway, I just added some overrides to get the actual byte[] optionally and all seems well with the ISO-8859-1 encoding. Thanks! – John Jul 01 '11 at 13:34
1

Wow, holy shit, so these bytes are pretty good markers of incorrect encoding being used! – mike nelson Dec 14 '15 at 18:55

Whats going on with this byte array?

1 Answers1

Linked