2

I'm using std::wofstream to write characters in a text file.My characters can have chars from very different languages(english to chinese). I want to print my vector<wstring> into that file. If my vector contains only english characters I can print them without a problem. But if I write chineses characters my file remains empty.

I browsed trough stackoverflow and all answers said bascially to use functions from the library:

#include <codecvt>

I can't include that library, because I am using Dev-C++ in version 5.11. I did:#define UNICODE in all my header files. I guess there is a really simple solution for that problem. It would be great, if someone could help me out.

My code:

#define UNICODE
#include <string>
#include <fstream>

using namespace std;

int main()
{
    string Path = "D:\\Users\\\t\\Desktop\\korrigiert_RotCommon_zh_check_error.log";
    wofstream Out;
    wstring eng = L"hello";
    wstring chi = L"程序";

    Out.open(Path, ios::out);

    //works.
    Out << eng;

    //fails
    Out << chi;

    Out.close();

    return 0;
}

Kind Regards

  • 1
    You should probably add some code to show what you are doing. – Gaurav Sehgal Feb 16 '18 at 09:24
  • 3
    Please [read about how to ask good questions](http://stackoverflow.com/help/how-to-ask), and learn how to create a [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve). Without knowing what you do it's impossible for us to help you. – Some programmer dude Feb 16 '18 at 09:24
  • Despite the edit, this is still not considered a [mcve]. There is too much stuff here that is irrelevant to the question, e. g. the strings being stored in a vector, the nested loops and so on. Try to create a new application from scratch that showcases your problem with a minimum amount of code. This will greatly improve your chances of getting a good answer. – zett42 Feb 16 '18 at 10:00
  • `UNICODE` selects the version of Windows API calls. It's the `_UNICODE` preprocessor symbol that applies to the C Runtime. But since you aren't using any of the generic-text mappings anyway, neither symbol is required. That said, have you verified, that `chi` actually holds the characters you expect? Probably related, if not a duplicate: [What are the different character sets used for?](https://stackoverflow.com/q/27872517/1889329). – IInspectable Feb 16 '18 at 11:01
  • Yes. I've verified it. I can use chinese characters in my programm. I can even write them into excel using a COM-Objet. I just can't print them into a normal text file. –  Feb 16 '18 at 11:08
  • By *"verification"* I meant looking at the raw bytes in memory, and comparing them against the expected UTF-16 encoding. Transferring them into a program that does who-knows-what to account for common programming errors is not helpful. Also, how do you determine, that your text file is wrong? Again, load it up in a hex editor and examine the raw bytes. Do they match the expected UTF-16 sequence? – IInspectable Feb 16 '18 at 11:23
  • You really ought to avoid `using namespace std` - it is a bad habit to get into, and [can silently change the meaning of your program](/q/1452721) when you're not expecting it. Get used to using the namespace prefix (`std` is intentionally very short), or importing *just the names you need* into the *smallest reasonable scope*. – Toby Speight Feb 16 '18 at 12:00
  • You should probably show the code that detects the Chinese output failing. For me, `Out << chi` succeeds, but it's the subsequent `Out << std::flush` (that you didn't write, but is implicit in the `close()`) that sets `std::ios_base::badbit` on the stream. – Toby Speight Feb 16 '18 at 12:09
  • My file only contains "hello" if I execute my program. –  Feb 16 '18 at 12:16
  • @TobySpeight I didn't do that in my main programm. I just tried to fullfill what @ Some programmer dude wrote –  Feb 16 '18 at 12:30
  • Regarding the lack of . It's 2018 now, and that is a C++11 component. If your dev-c++ setup does not have that, it is only good for computer archeology, not for C++ programming. – Cubbi Feb 27 '18 at 14:44

2 Answers2

2

Even if the name of the wofstream implies it's a wide char stream, it's not. It's still a char stream that uses a convert facet from a locale to convert the wchars to char.

Here is what cppreference says:

All file I/O operations performed through std::basic_fstream<CharT> use the std::codecvt<CharT, char, std::mbstate_t> facet of the locale imbued in the stream.

So you could either set the global locale to one that supports Chinese or imbue the stream. In both cases you'll get a single byte stream.

#include <locale>
//...
const std::locale loc = std::locale(std::locale(), new std::codecvt_utf8<wchar_t>);

Out.open(Path, ios::out);
Out.imbue(loc);

Unfortunately std::codecvt_utf8 is already deprecated[2]. This MSDN magazine article explains how to do UTF-8 conversion using MultiByteToWideChar C++ - Unicode Encoding Conversions with STL Strings and Win32 APIs.

Here the Microsoft/vcpkg variant of an to_utf8 conversion:

std::string to_utf8(const CWStringView w)
{
    const size_t size = WideCharToMultiByte(CP_UTF8, 0, w.c_str(), -1, nullptr, 0, nullptr, nullptr);
    std::string output;
    output.resize(size - 1);
    WideCharToMultiByte(CP_UTF8, 0, w.c_str(), -1, output.data(), size - 1, nullptr, nullptr);
    return output;
 }

On the other side you can use normal binary stream and write the wstring data with write().

std::ofstream Out(Path, ios::out | ios::binary);

const uint16_t bom = 0xFEFF;
Out.write(reinterpret_cast<const char*>(&bom), sizeof(bom));    // optional Byte order mark

Out.write(reinterpret_cast<const char*>(chi.data()), chi.size() * sizeof(wchar_t));
Mihayl
  • 3,821
  • 2
  • 13
  • 32
  • I can't include , because my IDE is Dev-C++ 5.11. codevct seem to be an only MS VS option. –  Feb 16 '18 at 12:32
  • The function .imbue() give me a runtime error as I mentioned to Toby Speight –  Feb 16 '18 at 12:50
  • 1
    _"Unfortunately std::codecvt_utf8 is already deprecated[2]"_ -- from that link: _" this library component should be retired to Annex D, along side , until a suitable replacement is standardized"_ -- until there is any such replacement in sight I see no reason to stop using `std::codecvt_utf8`. – zett42 Feb 16 '18 at 17:07
  • +1 for the 0xFEFF BOM adding... it's much more convenient than using WideCharToMultiByte. Not sure fully of the implications but it works... – Mecanik Aug 29 '22 at 07:16
-1

You forgot to tell your stream what locale to use:

Out.imbue(std::locale("zh_CN.UTF-8"));

You'll obviously need to include <locale> for this.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • When i copy that into my program, I'll get a crash at this line. Its a Microsoft Visual C++ Runtime Library error. It says "This application has requested the Runtime to terminate it in an unsual way" –  Feb 16 '18 at 12:24
  • UTF-8 locales are not supported by MS C++ runtime. – zett42 Feb 16 '18 at 12:56
  • I don't know the Microsoft runtime, but you'll probably have to adapt how you construct the locale to suit its way of writing locale strings. I tested this on a Debian system (with the appropriate locale installed) and that changed your code from non-working to working. – Toby Speight Feb 16 '18 at 13:00
  • @zett42, James McNellis: "No, UTF-8 locales are not supported." - https://blogs.msdn.microsoft.com/vcblog/2014/06/10/the-great-c-runtime-crt-refactoring/ – Mihayl Feb 16 '18 at 13:16