17

I've been struggling a lot to do something that looks simple: writing the contents to a std::wstring to the disk. Suppose I have the following string that I want to write into a plain text file:

std::wstring s = L"输入法."; // random characters pulled from baidu.cn
  1. Using std::codecvt_utf8 or boost locale

Here is the code I used:

std::wofstream out(destination.wstring(), std::ios_base::out | std::ios_base::app);
const std::locale utf8_locale = std::locale(std::locale(), new boost::locale::utf8_codecvt<wchar_t>());
out.imbue(utf8_locale);
// Verify that the file opened correctly
out << s << std::endl;

This works fine on Windows, but sadly I was compile it on Linux: codecvt_utf8 is not yet available on compilers provided with the usual distributions, and the Boost:Locale has only been included in Boost 1.60.0 which again is a version which is too recent for the distro's repositories. Without setting the locale, nothing is written to the file (on both systems).

  1. With fwrite

Next attempt:

FILE* out = fopen("test.txt", "a+,ccs=UTF-8");
fwrite(s.c_str(), wcslen(s.c_str()) * sizeof(wchar_t), 1, out);
fclose(out);

This works on Windows, but does not write anything to the file on Linux. I also tried opening the file in binary mode, but that didn't change anything. Not setting the ccs part causes undecipherable garbage to be written to the file.

I'm obviously missing something here: what is the proper way to write that string to a file?

Sandburg
  • 757
  • 15
  • 28
executifs
  • 1,138
  • 1
  • 9
  • 23
  • libiconv (https://www.gnu.org/software/libiconv/) could help –  May 23 '16 at 16:05
  • 1
    You don't need codecvt on linux/gcc, just a UTF-8 locale. `std::locale::global(std::locale(""));` in the beginning of your program should do the trick if your default locale is a UTF-8 one. Of course this won't work on Windows. Otherwise just use gentoo and forget about fossilized packages provided by the usual distros. – n. m. could be an AI May 23 '16 at 16:05
  • Possible dupe, any of these answers work for you? I don't know anything about streams: http://stackoverflow.com/q/4053918/2069064 – Barry May 23 '16 at 16:06
  • libiconv and other recoding libraries don't provide a 100% multiplatform solution because the source encoding is machine-dependent (wchar_t is utf16 on windows, ucs4 on linux). But it's a small dependence that can be managed. OTOH keeping everything in utf-8 encoded char (as opposed to wchat_t) *is* platform-independent. – n. m. could be an AI May 23 '16 at 16:50
  • 1
    @n.m.:Keeping everything UTF-16LE encoded is equally platform-independent. If you have more then one encoding to pick from, it's not in the particular encoding you choose, that makes it platform-independent. It's the fact, that you agree on a single encoding that does. – IInspectable May 23 '16 at 18:50
  • @IInspectable "Keeping everything UTF-16LE encoded is equally platform-independent" Yes, but why do that? The world runs on utf-8. Just use it. Quite a no-brainer, really. – n. m. could be an AI May 23 '16 at 19:37
  • @n.m.: You are missing the point entirely. I wasn't proposing to use UTF-16 everywhere. I was pointing out, that your statement (*"keeping everything in utf-8 encoded char (as opposed to wchat_t) is platform-independent."*) was inaccurate. – IInspectable May 23 '16 at 19:45
  • @IInspectable wchar_t is not utf-16 on all platforms, You can keep everything in utf-16 or you can keep everything in wchar_t, these are two different options. They are the same only on one platform I know of. Not even that popular any more. – n. m. could be an AI May 23 '16 at 20:03
  • @n.m.: What are you going on about? The statement is inaccurate. Agreeing on a single, well-defined encoding is sufficient to keep things platform-independent. Picking one encoding (UTF-8 in this case) meets that criteria, but doesn't describe it accurately. Also, give me a call when your product runs on hundreds of millions of devices, and you have to worry that it isn't popular anymore. I'll try my best to take you less serious in the future. – IInspectable May 23 '16 at 20:10
  • @IInspectable I'm not getting it. What statement is inaccurate? That wchat_t is platform dependent? It is as accurate as it gets. That UTF-8 is the only way to be platform independent? I have never said that. What else? Also I in fact do make a product that runs hundreds of millions of devices (not alone obviously) but I don't see why this is relevant to this discussion. – n. m. could be an AI May 23 '16 at 21:37
  • 1
    I think the problem is in the encoding of your source file on Linux, not in the actual code. I tried running your code on Linux, and it does in fact produce garbage. Then I modified it so that the `wstring` is not initialized from a literal in the source code, but rather read from another file that does contain the correct wide string (produced by your code on Windows). In that case the program worked just fine :) So I think on Linux the case is that the wide string literal doesn't get picked up correctly by the compiler when reading the source code, or something along those lines. – notadam May 26 '16 at 16:20
  • By the way if you really want to be absolutely correct on all platforms, consider the endianness of your `wchar_t`s. It's not a problem with normal strings since a simple `char` is only one byte, but a `wchar_t` is represented differently on machines with different endianness :) – notadam May 26 '16 at 16:27

2 Answers2

1

You can use next code snipped. The difference from your code is that here I used std::codecvt_utf8 instead of boost::locale....

#include <locale>
#include <codecvt>

----

std::wstring s = L"输入法.";

const std::locale utf8_locale = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>());

myfile.open("E:/testFile.txt");
if (myfile.is_open())
{
    myfile.imbue(utf8_locale);
    myfile << s << endl;
    myfile.close();
}
else
{
    std::cout << "Unable to open file";
}
Ionut V.
  • 110
  • 7
-1

The stream types always produce ASCII output even when the input data is Unicode. As first, you should setup locale for your output. Only later, you should write anything to file. I think, this example should help you. I was running it on Ubuntu.

#include <cstdio>
#include <cwchar>
#include <string>
#include <locale>

void write_string(FILE *fd, std::wstring str)
{
    std::fputws(str.c_str(), fd);
}

int main()
{
    setlocale(0, "");
    FILE *fd = std::fopen("./test.txt", "w");

    write_string(fd, L"输入法.");

    std::fclose(fd);

    return 0;
}