2

I want to convert a string encoded in a doublebyte code page into an UTF-16 string using std::codecvt<wchar_t, char, std::mbstate_t>::in() on the Microsoft standard library implementation (MSVC11). For example, consider the following program:

#include <iostream>
#include <locale>

int main()
{
    // KATAKANA LETTER A (U+30A2) in Shift-JIS (Codepage 932)
    // http://msdn.microsoft.com/en-us/goglobal/cc305152
    char const cs[] = "\x83\x41";

    std::locale loc = std::locale("Japanese");

    // Output: "Japanese_Japan.932" (as expected)
    std::cout << loc.name() << '\n';

    typedef std::codecvt<wchar_t, char, std::mbstate_t> cvt_t;
    cvt_t const& codecvt = std::use_facet<cvt_t>(loc);
    wchar_t out = 0;
    std::mbstate_t mbst = std::mbstate_t();
    char const* mid;
    wchar_t* outmid;

    // Output: "2" (error) (expected: "0" (ok))
    std::cout << codecvt.in(
        mbst, cs,   cs + 2,   mid,
              &out, &out + 1, outmid) << '\n';

    // Output: "0" (expected: "30a2")
    std::cout << std::hex << out << '\n';
}

When debugging, I found out that in() ends up calling the internal _Mbrtowc() function (crt\src\xmbtowc.c), passing the internal (C?) part of the std::locale, initialized with {_Page=932 _Mbcurmax=2 _Isclocale=0 ...}, where ... stands for (and this seems to be the problem) the _Isleadbyte member, initialized to an array of 32 zeros (of type unsigned char). Thus, when the function processes the '\x32' lead byte, it checks with this array and naturally comes to the (wrong) conclusion that this is not a lead byte. So it happily calls the MultiByteToWideChar() Win-API function, which, of course, fails to convert the halfed character. So, _Mbrtowc() returns the error code -1, which more or less cancels everything up the call stack and ultimately the 2 (std::codecvt_base::result::error) is returned.

Is this a bug in the MS standard library (it seems so)? (How) can I work around this in a portable way (i.e. with the least amount of #ifdefs)?

Oberon
  • 3,219
  • 15
  • 30

2 Answers2

1

I copy pasted your code in VC2010 / Windows 7 64-bit.

It works as you expect. Here's the output:

Japanese_Japan.932
0
30a2

It's probably a bug introduced with VC2012...

OOEngineer
  • 447
  • 3
  • 12
1

I reported it internally to Microsoft. The have now filled it as a new bug (DevDiv#737880). But I recomment to fill out a connect item at: http://connect.microsoft.com/VisualStudio

Jochen Kalmbach
  • 3,549
  • 17
  • 18
  • Thank you! Actually, I am not directly affected by this bug, I found it when trying to create a minimal example for a related (3rd party library) bug. But think I will take the time and file a connect bug. – Oberon Jul 16 '13 at 12:25
  • Unfortunately, it seems like I can't file a bug: I'm stuck in a loop where I'm repeatedly asked to complete my required profile information: connect.microsoft.com "Complete your required profile information" --(Next)--> social.microsoft.com "Edit My Profile" --(Save)--> profile.microsoft.com "Register" --(Next)--> (back to start). Of course, I have filled in all required (*) fields. – Oberon Jul 16 '13 at 12:42