0

I want to encode and then decode a string that contains multilingual characters, in which the language, length and character positioning (like, chinese character on indexes 8-10) are unknown.

Is it even possible to have a "universal" encoder? Or some algorithm that knows how to decode this?

Searching the web came up with only solutions that involved knowing where the special characters are, and of what language, and I cant even know the language itself.

Any ideas?

EDIT: Example: a string that consists of several languages, such as:

"Hello {CHINESE} my {LATIN} is rusted"

which consists of english, chinese, and latin.

But when I do

var test = ASCIIEncoding.ASCII.GetBytes(someStr);

and then

ASCIIEncoding.ASCII.GetString(test)

the "special characters" (IE, not english characters) are converted to question marks

Andrey Korneyev
  • 26,353
  • 15
  • 70
  • 71
Tomer Something
  • 770
  • 1
  • 10
  • 24
  • What do you mean by "encode"? What context makes some characters "special"? No character is any more special than any other other than in a given context (e.g. `漢` is special in URLs but not in HTML). – Jon Hanna Mar 01 '17 at 14:54
  • Can you provide some examples? Right now it is unclear what is your concrete problem and what is your goal. – Andrey Korneyev Mar 01 '17 at 14:54
  • 2
    UTF16 (and UTF8) are perfectly good encodings that support all the characters that you'll use :-) – xanatos Mar 01 '17 at 14:57
  • 1
    Ok... So don't use `ASCIIEncoding`? It is a relic of a bygone era... Use `Encoding.UTF8.GetBytes`. and `Encoding.UTF8.GetString` – xanatos Mar 01 '17 at 15:05

1 Answers1

3

Don't use ASCII encoding since it isn't supposed to handle multiple language characters in the same string.

Use Unicode instead:

var test = UnicodeEncoding.Unicode.GetBytes(someStr);
var test1 = UnicodeEncoding.Unicode.GetString(test);
Andrey Korneyev
  • 26,353
  • 15
  • 70
  • 71