2

I have an application that is working with file I/O using utf-8 encoded widestrings.

Working Code:

const wchar_t* wc = L"C:\Documents\TestPath\TestFile.txt";
std::wfstream wf(wc);
wf.imbue(std::locale(wf.getloc(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::consume_header>()));
return wf.is_open();
...
wf << L"測試文件夾" << L"\n";

However, once unicode characters are introduced in the filepath the file no longer opens properly. Ie. the following code does not work and returns false:

const wchar_t* wc = L"C:\Documents\測試文件夾\TestFile.txt";
std::wfstream wf(wc);
return wf.is_open();

What am I doing wrong here? It seems like there should be a simple way to get wfstream working with unicode filepaths but I have searched all over the internet and cannot find one.

Thanks

  • Do you not get any warnings from the compiler? Even with no warning options specified, MSVC [gives me some](https://gcc.godbolt.org/z/x9oE9v). – chris Dec 18 '20 at 19:05
  • There are no warnings from the compiler. When I follow the stack to see where the open is failing, it seems to be at line 234 in where ```if (_Myfile != 0 || (_File = _Fiopen(_Filename, _Mode, _Prot)) == 0)``` produces a failbit. I don't think there's anything wrong with the code as it works fine with ASCII filepaths and unicode file content. – Feels Like C-- Dec 18 '20 at 20:05
  • `L"...` is not a UTF-8 encoded literal. Your source file might be UTF-8 encoded. Are you using `/utf-8` compiler switch? – n. m. could be an AI Dec 18 '20 at 21:17

2 Answers2

2

Thanks for the help everyone.

I found out how to get the code working with an unusual solution, which might help anyone else in the same situation:

  1. use C-style _wfopen to create file
FILE * fp = _wfopen(cpFullPath, L"w"); 
fclose(fp); 
  1. use ::GetShortPathW function to get short ASCII representation of UTF8 path to newly created file
wchar_t short_path[511] {} ;                                                
::GetShortPathNameW( cpFullPath, short_path, 511 ) ;
// cpFullPath is L"C:\\Desktop\\測試文件夾\\те \x81това \x8f папка\\file.txt" 
// short_path becomes L"C:\\Desktop\\12BE~3\\81C2~6\\file.txt"      
  1. open file using wfstream and imbue stream for UTF8 I/O
std::wfstream textFileStream;
textFileStream.open(short_path, ios::in | ios::out);
textFileStream.imbue(std::locale(textFileStream.getloc(), new std::codecvt_utf8<wchar_t, 0x10ffff, std::consume_header>())); 
0

Your string literals need to escape \ characters, eg:

const wchar_t* wc = L"C:\\Documents\\TestPath\\TestFile.txt";
const wchar_t* wc = L"C:\\Documents\\測試文件夾\\TestFile.txt";

Otherwise, use raw string literals instead:

const wchar_t* wc = LR"(C:\Documents\TestPath\TestFile.txt)";
const wchar_t* wc = LR"(C:\Documents\測試文件夾\TestFile.txt)";

That being said, double check that the charset used to save your cpp file matches the charset the compiler uses to parse the file, otherwise non-ASCII characters like 測試文件夾 won't work in string literals correctly.

Otherwise, use Unicode escape sequences instead:

const wchar_t* wc = L"C:\\Documents\\\u6e2c\u8a66\u6587\u4ef6\u593e\\TestFile.txt";
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770