0

How to encode UTF8 text to Unicode?

string text_txt = "пÑивеÑ";    
byte[] bytesUtf8 = Encoding.Default.GetBytes(text_txt);
text_txt = Encoding.UTF8.GetString(bytesUtf8);

The problem is output: п�?иве�

I need output: привет

Using that site: https://www.branah.com/unicode-converter enter text in "UTF-8 text (Example: a 中 Я)" to "пÑивеÑ" it will show you "привет" on Unicode text

Please give some advice thanks

Rahul Sharma
  • 7,768
  • 2
  • 28
  • 54
J.Col
  • 41
  • 8

3 Answers3

1
   byte[] utf8Bytes = new byte[text_txt.Length];
                for (int i = 0; i < text_txt.Length; ++i)
                {
                    //Debug.Assert( 0 <= utf8String[i] && utf8String[i] <= 255, "the char must be in byte's range");
                    utf8Bytes[i] = (byte)text_txt[i];
                }
                text_txt= Encoding.UTF8.GetString(utf8Bytes, 0, text_txt.Length);

from answer: How to convert a UTF-8 string into Unicode?

Solution
  • 164
  • 1
  • 2
  • 11
0

Well, you probably mean this:

// Forward: given in UTF-8 represented in WIN-1252

  byte[] data = Encoding.UTF8.GetBytes("привет");
  string text = Encoding.GetEncoding(1252).GetString(data);

// Reverse: given in WIN-1252 represented in UTF-8

  byte[] reversedData = Encoding.GetEncoding(1252).GetBytes("привет");
  string reversedText = Encoding.UTF8.GetString(reversedData);

  Console.WriteLine($"{string.Join(" ", data)} <=> {text}");
  Console.WriteLine(reversedText);

Outcome:

208 191 209 128 208 184 208 178 208 181 209 130 <=> привет
привет

Please, note that you've omitted and , characters:

 Ð¿Ñ Ð¸Ð²ÐµÑ  - actual string
 привет - should be
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
-1

You need to be explicit about the type of encoding you're using to convert to bytes, (Syste.Text.Encoding.UTF8.GetBytes). eg:

using System;
using System.Text;

public class Program {
    public static void Main() {
        string text_txt = "пÑивеÑ";

        byte[] bytesUtf8 = Encoding.UTF8.GetBytes(text_txt);
        text_txt = Encoding.UTF8.GetString(bytesUtf8);

        Console.WriteLine(text_txt);
    }
}

This way UTF8 is used to both encode and decode the string the same way, and when you ensure the same string comes back from the GetString method.

Fehr
  • 476
  • 4
  • 5
  • Did you try it on your console? Nothing converted.. – J.Col Feb 20 '20 at 08:28
  • His original post converted "пÑивеÑ" to п�?иве�. I assumed this was his issue, but I'm otherwise unclear on why the poster expects the input should produce "привет" rather then the input once reversed. – Fehr Feb 20 '20 at 08:33