5

I am trying to read a text file encoded in Shift-JIS (cp 932) using std::wifstream, and std::getline. The following code works in VS2010 but fails in VS2013:

std::wifstream in;
in.open("data932.txt");

const std::locale locale(".932");

in.imbue(locale);

std::wstring line1, line2;
std::getline(in, line1);
std::getline(in, line2);
const bool good = in.good();

The file contains several lines, where the first line contains just ASCII characters, and the second is Japanese script. Thus, when this snippet runs, line1 should contain the ASCII line, line2 the Japanese script, and good should be true.

When compiled in VS2010, the result is as expected. But when compiled in VS2013, line1 contains the ASCII line, but line2 is empty, and good is false.

I debugged into the CRT, (as the source is provided with Visual Studio), and found that an internal function called _Mbrtowc (in file xmbtowc.c) was modified between the two versions, and the way they use to detect a lead byte of a double byte character was changed, and the one in VS 2013 fails to detect a lead byte, thus fails to decode the byte stream.

Further debugging revealed a point, where a _Cvtvec object's _Isleadbyte array is initialized (in the function _Getcvt(), in file xwctomb.c), and that initialization produces a wrong result. It seems that it always uses code page 1252, which is the default code page on my system, and not 932 which is set for the stream in use. However, I could not decide if it is by design, and I missed some required steps to get a good result, or this is indeed a bug in the CRT for VS2013.

Unfortunately I don't have VS2012 installed, so I could not test on that version.

Any insights on this topic are welcome!

Peter B
  • 416
  • 2
  • 7
  • Post this to connect.microsoft.com – Hans Passant Oct 28 '14 at 21:12
  • I would start off trying to `imbue()` the new locale prior to opening the file: I think the the stream may read characters during open and once characters are read it won't change the used `std::codecvt<...>` facet. – Dietmar Kühl Oct 28 '14 at 21:27
  • @DietmarKühl I've just checked it, but the results were the same: works in VS2010, but fails the same way in VS2013. – Peter B Oct 28 '14 at 21:37
  • @HansPassant report sent to msft connect: https://connect.microsoft.com/VisualStudio/feedback/details/1014054/shift-jis-decoding-fails-using-wifstrem-in-visual-c-2013 – Peter B Oct 29 '14 at 09:10

1 Answers1

2

I have found a workaround: if for the creation of the locale I explicitly change the global MBC code page, the locale is initialized correctly, and the lines are read and decoded as expected.

const int oldMbcp = _getmbcp();
_setmbcp(932);
const std::locale locale("Japanese_Japan.932");
_setmbcp(oldMbcp);
Peter B
  • 416
  • 2
  • 7
  • I have been [hit](http://stackoverflow.com/questions/33254089/double-byte-character-sequence-conversion-issue-in-visual-studio-2015) by this as well. I have filled a [bug report](https://connect.microsoft.com/VisualStudio/feedback/details/1925650) for the issue. – wilx Oct 21 '15 at 14:07