I need to use a C++ program to create a new file with Unicode characters (for example, äöüé.txt
) in both Windows and Linux with the following code:
int main(){
std::string nameOfFile;
std::cout << "Please enter the name of file ! " << std::endl;
std::cin >> nameOfFile;
std::cout << "name = " << nameOfFile << std::endl;
std::fstream mystream;
mystream.open(nameOfFile, std::ios::out | std::ios::trunc | std::ios::binary);
mystream.close();
return 0;
}
I execute the same program both in Windows and Linux (with Visual Studio 2015 for Windows and gcc 5.4 for Linux), with the input "äöüé.txt"
in the terminal.
I found that the file "äöüé.txt"
is created correctly with the right file name "äöüé.txt"
in Linux. But the file name created in Windows seems to be bad ("„”‚.txt"
).
I know that this is because of the encoding difference between Linux and Windows. Linux adopts UTF-8 while Windows adopts UTF-16.
Now my need is to create the file in Windows correctly, just as in Linux.
I have tried the following methods:
(1) according to std::wstring VS std::string, I tried to use Microsoft's MultiByteToWideChar()
function as described in details here: Open utf8 encoded filename in c++ Windows, but FAIL:
#ifdef _MSC_VER
std::wstring ToUtf16(std::string str)
{
std::wstring ret;
int len = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0);
if (len > 0)
{
ret.resize(len);
MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len);
}
return ret;
}
#endif
int main()
{
std::string nameOfFile;
std::cout << "Please enter the name of file ! " << std::endl;
std::cin >> nameOfFile;
std::cout << "name = " << nameOfFile << std::endl;
std::ifstream iFileStream(
#ifdef _MSC_VER
ToUtf16(nameOfFile).c_str()
#else
nameOfFile.c_str()
#endif
, std::ifstream::in | std::ifstream::binary);
return 0;
}
(2) according to How to create a file with UNICODE path on Windows with C++, I tried to use the CreateFile()
function, but FAIL:
int main()
{
std::string nameOfFile;
std::cout << "Please enter the name of file ! " << std::endl;
std::cin >> nameOfFile;
std::cout << "name = " << nameOfFile << std::endl;
/*convert string to char array */
int stringLen = nameOfFile.length();
char* text = new char[stringLen + 1];
std::strcpy(text, nameOfFile.c_str());
/*Convert to utf-16*/
HANDLE hFile = CreateFileA(nameOfFile.c_str(),
GENERIC_WRITE,
0,
NULL,
CREATE_NEW,
FILE_ATTRIBUTE_NORMAL,
NULL);
if (hFile != INVALID_HANDLE_VALUE) {
int file_descriptor = _open_osfhandle((intptr_t)hFile, 0);
if (file_descriptor != -1) {
FILE* file = _fdopen(file_descriptor, "w");
if (file != NULL) {
std::ofstream stream(file);
stream << "Hello World\n";
// Closes stream, file, file_descriptor, and file_handle.
stream.close();
file = NULL;
file_descriptor = -1;
hFile = INVALID_HANDLE_VALUE;
}
}
}
return 0;
}
(3) according to https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16 (see the example at bottom), I tried to use codecvt
function and then use _wfopen()
as described here: https://learn.microsoft.com/en-us/previous-versions/yeby3zcb(v%3Dvs.140), but FAIL.
My constraints are that:
C++11 (I know that C++17 involve the filesystem in STL, so this problem can be resolved) as described here: How to open an std::fstream (ofstream or ifstream) with a unicode filename?
boost is not allowed
QT library is not allowed
The only things I can use is the C++ standard library and Microsoft library.
Do you have some ideas?
To Alan:
Thanks to your reply, i have used the following code to verify the encoding of character in my windows:
int main(){
std::wstring nameOfFile;
std::wcout << "Please enter the name of file ! " << std::endl;
std::wcin >> nameOfFile;
std::wcout << "name = " << nameOfFile << std::endl;
/*convert string to char array */
int stringLen = nameOfFile.length();
wchar_t* text = new wchar_t[stringLen + 1];
std::wcscpy(text, nameOfFile.c_str());
/*Get the coding number*/
std::cout << "strlen(text) : " << wcslen(text) << std::endl;
std::cout << "text(ordinals) :";
for (size_t i = 0, iMax = wcslen(text); i < iMax; ++i)
{
std::cout << " " << static_cast<unsigned int>(
static_cast<unsigned char>(text[i])
);
}
_wfopen(text, L"w");
return 0;
}
The code page of my Windows is 850, and the output shows that äöüé
encode as 132 148 129 130
, which, according to the table for code page 850, is exactly represent ä(132) ö'(148) ü(129) é(130)
.
At the end of the code above, I use the _wfopen()
function to create a file, but the exact file created is still badly named.
By the way, the use of std::fstream()
, as shown in my second example, can not create a new file, it can just read an existing file.
I think the fopen()
or _wfopen()
are the only functions which can create the new file instead of reading an existing file.