0

Possible Duplicate:
C++ & Boost: encode/decode UTF-8

I need to convert a wstring which has UTF-8-encoded text into it (incorrectly) into its own new corrected wstring.

PHP's ut8_decode function handles this perfectly. I can also do it through iconv:

from:

# cat wtf  | grep 1856 | awk '{print $2}'
å°ç³å·ãçç¾

to:

#cat wtf  | grep 1856 | awk '{print $2}'| iconv -f utf8 -t ISO-8859-1
小石川 玉美

In C# I was able to get this behavior with:

public static string Utf8Decode(string utf8me) {
    return Encoding.UTF8.GetString(Encoding.GetEncoding(28591).GetBytes(utf8me));
}

Most of my searching has come up with Windows-specific workarounds. Since I am on Linux I assume I will want to use iconv but I am not sure how to do this in C++.

Community
  • 1
  • 1
  • 1
    What type of `wstring`? UCS-2? UTF-32? – Pubby Dec 20 '12 at 08:41
  • `utf8_decode` converts UTF-8 encoded text to ISO-8859-1 encoded text. ISO-8859 cannot encode the characters "小石川 玉美". You have a really strong case of screwed up encodings! [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) – deceze Dec 20 '12 at 11:42

1 Answers1