The code that is presented takes a native .NET string
(which uses UTF-16 encoding), encodes it to Windows-1256, then mis-interprets that result as UTF-8 when it really isn't. So, of course the decoding of UTF-8 will produce ?
for non-ASCII characters, as they will not have been encoded as UTF-8 to begin with.
The code is not doing what the question is asking for.
The correct way to convert Windows-1256 (or any other encoding) to UTF-8 is to first take the source data as-is and decode it to UTF-16 using the original encoding, and then encode that result to UTF-8, eg:
byte[] Win1256Data = ...;
string s = Encoding.GetEncoding(1256).GetString(Win1256Data);
byte[] Utf8Data = Encoding.UTF8.GetBytes(s);
Alternatively, the Encoding
class has a Convert()
method to handle the intermediate conversion for you:
byte[] Win1256Data = ...;
byte[] Utf8Data = Encoding.Convert(Encoding.GetEncoding(1256), Encoding.UTF8, Win1256Data);