4

Additional information: Unable to translate Unicode character \uDFFF at index 195 to specified code page.

I made an algorithm, who's result are binary values (different lengths). I transformed it into uint, and then into chars and saved into stringbuilder, as you can see below:

uint n = Convert.ToUInt16(tmp_chars, 2);
_koded_text.Append(Convert.ToChar(n));

My problem is, that when i try to save those values into .txt i get the previously mentioned error.

StreamWriter file = new StreamWriter(filename);
            file.WriteLine(_koded_text);
            file.Close();

What i am saving is this: "忿췾᷿]볯褟ﶞ痢ﳻ��伞ﳴ㿯ﹽ翼蛿㐻ﰻ筹��﷿₩マ랿鳿⏟麞펿"... which are some weird signs.

What i need is to convert those binary values into some kind of string of chars and save it to txt. I saw somewhere that converting to UTF8 should help, but i don't know how to. Would changing files encoding help too?

Bojan Kogoj
  • 5,321
  • 3
  • 35
  • 57
  • 1
    Are you actually trying to write bunch of ints to a txt file? Do you want those ints to be human readable in txt file? If it's really byte array would base64 encoding help? – Dialecticus May 07 '11 at 13:49

2 Answers2

4

You cannot transform binary data to a string directly. The Unicode characters in a string are encoded using utf16 in .NET. That encoding uses two bytes per character, providing 65536 distinct values. Unicode however has over one million codepoints. To make that work, the Unicode codepoints above \uffff (above the BMP, Basic Multilingual Plane) are encoded with a surrogate pair. The first one has a value between 0xd800 and 0xdbff, the second between 0xdc00 and 0xdfff. That provides 2 ^ (10 + 10) = 1 million additional codes.

You can perhaps see where this leads, in your case the code detects a high surrogate value (0xdfff) that isn't paired with a low surrogate. That's illegal. Lots more possible mishaps, several codepoints are unassigned, several are diacritics that get mangled when the string is normalized.

You just can't make this work. Base64 encoding is the standard way to carry binary data across a text stream. It uses 6 bits per character, 3 bytes require 4 characters. The character set is ASCII so the odds of the receiving program decoding the character back to binary incorrectly are minimal. Only a decades old IBM mainframe that uses EBCDIC could get you into trouble. Or just plain avoid encoding to text and keep it binary.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 9873 chars ling "binary string" becomes 13165 chars long base64 string. Any other way how to do this? Problem is i cannot save it to binary file, because this file contains more then just this.. – Bojan Kogoj May 07 '11 at 18:41
  • Your hard disk surely has another couple of billion unused bytes available? A terabyte cost less than a hundred bucks, you cannot possibly fret over a handful of kilobytes. Certainly not worth another encoding scheme that reduces it by 10%. – Hans Passant May 07 '11 at 19:44
  • Well no, that is not the problem. It's just an algorithm that i have to make. – Bojan Kogoj May 07 '11 at 23:36
  • Is this a homework assignment? Why do you *have* to make an algorithm? – Hans Passant May 07 '11 at 23:56
0

Since you're trying to encode binary data to a text stream this SO question already contains an answer to the question: "How do I encode something as base64?" From there plain ASCII/ANSI text is fine for the output encoding.

Community
  • 1
  • 1
Jason D
  • 2,303
  • 14
  • 24