1

I have a unicode text with some unicode characters say,"Hello, world! this paragraph has some unicode characters."

I want to convert this paragraph to binary string i.e in binary digits with datatype string. and after converting, I also want to convert that binary string back to unicode string.

Khuram Nawaz
  • 121
  • 1
  • 9
  • Duplicate of http://stackoverflow.com/questions/1615559/convert-a-unicode-string-to-an-escaped-ascii-string –  Jun 20 '16 at 11:44
  • @buffjape That is something else, its not a duplicate of what i want. What I want is shown in following example: Input: Hi, this text is in unicode. Output: 11000010111100101111 (digits in string datatype) Output2: Hi, this text is in unicode. Hope this will explain you my problem. – Khuram Nawaz Jun 20 '16 at 18:11
  • Is that example that you are providing here exact? "Hi, this text is in unicode." is in no way equal to any possible representation of "11000010111100101111" – pijemcolu Jun 21 '16 at 10:54
  • @pijemcolu If you look at the marked answer, it is exactly what i wanted. – Khuram Nawaz Jun 21 '16 at 10:58

2 Answers2

3

If you're simply looking for a way to decode and encode a string into byte[] and not actual binary then i would use System.Text

The actual example from msdn:

      string unicodeString = "This string contains the unicode character Pi (\u03a0)";

  // Create two different encodings.
  Encoding ascii = Encoding.ASCII;
  Encoding unicode = Encoding.Unicode;

  // Convert the string into a byte array.
  byte[] unicodeBytes = unicode.GetBytes(unicodeString);

  // Perform the conversion from one encoding to the other.
  byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);

  // Convert the new byte[] into a char[] and then into a string.
  char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
  ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
  string asciiString = new string(asciiChars);

  // Display the strings created before and after the conversion.
  Console.WriteLine("Original string: {0}", unicodeString);
  Console.WriteLine("Ascii converted string: {0}", asciiString);

Don't forget

using System;
using System.Text;
pijemcolu
  • 2,257
  • 22
  • 36
2

Since there are several encodings for the Unicode character set, you have to pick: UTF-8, UTF-16, UTF-32, etc. Say you picked UTF-8. You have to use the same encoding going both ways.

To convert to a binary string:

String.Join(
    String.Empty, // running them all together makes it tricky.
    Encoding.UTF8
        .GetBytes("Hello, world! this paragraph has some unicode characters.")
        .Select(byt => Convert.ToString(byt, 2).PadLeft(8, '0'))) // must ensure 8 digits.

And back again:

Encoding.UTF8.GetString(
    Regex.Split(
        "010010000110010101101100011011000110111100101100001000000111011101101111011100100110110001100100001000010010000001110100011010000110100101110011001000000111000001100001011100100110000101100111011100100110000101110000011010000010000001101000011000010111001100100000011100110110111101101101011001010010000001110101011011100110100101100011011011110110010001100101001000000110001101101000011000010111001001100001011000110111010001100101011100100111001100101110"
        ,"(.{8})") // this is the consequence of running them all together.
    .Where(binary => !String.IsNullOrEmpty(binary)) // keeps the matches; drops empty parts 
    .Select(binary => Convert.ToByte(binary, 2))
    .ToArray())
Tom Blodget
  • 20,260
  • 3
  • 39
  • 72