2

I am trying to use G++ to compile a simple C++ program. I am running Windows 10 and I have installed MinGW. So I tried to compile this file C:\Users\Vesk\Desktop\Информатика\Hello World.cpp with G++ by typing g++ "C:\Users\Vesk\Desktop\Информатика\Hello World.cpp" -o "C:\Users\Vesk\Desktop\Информатика\Hello World.exe" in the Command Prompt. G++ though didn't compile the file and gave me this error message:

g++: error: C:\Users\Vesk\Desktop\???????????\Hello World.cpp: Invalid argument
g++: fatal error: no input files
compilation terminated.

'Информатика' is just a word written in Cyrillic, so I was confused what the problem was. But then I just renamed the 'Информатика' folder to 'Informatics'. I tried to compile the file again with g++ "C:\Users\Vesk\Desktop\Informatics\Hello World.cpp" -o "C:\Users\Vesk\Desktop\Informatics\Hello World.exe". And lo and behold it worked. G++ compiled the file and the executable was there in the folder and working. But is there any way to actually compile a file if its path contains Cyrillic (or other Unicode) characters? If so, how?

Jorengarenar
  • 2,705
  • 5
  • 23
  • 60
Vesk
  • 138
  • 3
  • 13
  • 2
    What do you see on running in a command prompt window `chcp` to get displayed the [code page](https://en.wikipedia.org/wiki/Code_page) used by the Windows command processor by default according to the country configured for your account and next `dir "%UserProfile%\Desktop" /AD /B`? I suppose `chcp` outputs [855](https://en.wikipedia.org/wiki/Code_page_855) or __872__ or [866](https://en.wikipedia.org/wiki/Code_page_866) and g++ expects the Cyrillic letters encoded with UTF-8 like on Linux or with code page [Windows-1251](https://en.wikipedia.org/wiki/Windows_1251). – Mofi May 21 '21 at 18:29
  • 1
    What does happen on using in command prompt window `chcp 65001` to set UTF-8 to use for character encoding and then run g++? – Mofi May 21 '21 at 18:31
  • @Mofi when I run `chcp` it says my active code page is 437. I don't really know what that is, but my Windows is set to English. I tried running `chcp 65001` and then I tried running g++ on the old folder again, but it gave me the same error. – Vesk May 22 '21 at 12:01
  • Well, the North American OEM [code page 437](https://en.wikipedia.org/wiki/Code_page_437) is definitely not the code page which should be used by you on using Cyrillic letters in file/folder names as this code page (table) does not support these characters at all . You can use English Windows as I also do, but you should configure the region/country correct for your country. See for example [How to change country or region home location in Windows 10](https://www.tenforums.com/tutorials/68106-change-country-region-home-location-windows-10-a.html). – Mofi May 22 '21 at 15:51
  • Next restart Windows after having configured the country correct. Then open a command prompt window and run once again `chcp`. There should be output now one of the OEM code pages I wrote already in my first comment. It might also work to use in command prompt window just `chcp 855` or `chcp 866` to change the code page from `437` to a code page supporting the Cyrillic letters and then use MinGW `g++`. But better is to configure the country correct or on keeping using United States, don´t use Cyrillic letters in file/folder names. – Mofi May 22 '21 at 15:56
  • @Mofi Thanks a lot, changing the region worked! – Vesk May 23 '21 at 15:45

2 Answers2

2

Windows uses UTF-16 for Unicode file names. To my knowledge, it does not support UTF-8 as a locale although that would be very useful.

I tried on a very old MinGW G++ 4.6.3 and indeed it does not support Unicode characters in file paths that are outside current locale. Don't know about more recent MinGW GCC. A first possible solution would be to use a Russian locale.

For a Windows application to properly support Unicode file names, it needs to handle paths as wchar_t wide characters. The int main(int argc, const char* argv[]) classical signature for example must be replaced by int wmain(int argc, const wchar_t* argv[]). For a portable software like GCC, this is a complication that may not be worth it. Extremely few people will put characters in source file paths that are outside their current locale.

I tried G++ 10.2.0 on Cygwin and it works. This is because all Cygwin software link with cygwin1.dll which, among other services, automatically convert all UTF-8 paths to UTF-16.

prapin
  • 6,395
  • 5
  • 26
  • 44
  • 1
    I thought [in Windows 10](https://stackoverflow.com/a/57134096/1983398) it's possible to set UTF-8 for console programs. – ssbssa May 21 '21 at 23:00
  • Thank you for the answer. I have everything set to English on my Windows and I prefer it that way. I guess I'll probably have to just avoid using Cyrillic for my folder and file names, seeing as there doesn't seem to be an easy solution. – Vesk May 22 '21 at 12:04
0

You should first get the command line with UTF16 encoding with GetCommandLineW function (https://learn.microsoft.com/en-us/windows/win32/api/processenv/nf-processenv-getcommandlinew) and then separate the tokens with CommandLineToArgW (https://learn.microsoft.com/en-us/windows/win32/api/shellapi/nf-shellapi-commandlinetoargvw).

If you want UTF8 encoded strings you need to convert them, a simple, open source and useful tool to convert strings with different encodings in C++20 can be found here.

desio
  • 126
  • 6
  • 1
    In general on Windows you should always use the W-ending version of I/O functions if you need to manage any non-ASCII input/output, since the normal version of these commands heavily depends on the current prompt codepage (one-byte ASCII extensions encoding used by Windows, like Latin1) that differs very much from one computer to another. – desio May 24 '21 at 08:54
  • GCC is a cross platform application, supporting a lot of different platforms. You can't expect it to use platform specific code to such a degree. – Brecht Sanders Jun 01 '21 at 05:34
  • GCC doesn't exists on Windows, there are Cygwin and MinGW ;) and their behavior with cin, cout and cerr is very different with non-ASCII characters: Cygwin will convert all to UTF8 whereas MinGW uses default codepage (1251 for example) – desio Jun 01 '21 at 17:00
  • If you want to add extra overhead with UTF8/UTF16 conversions and work only with utf8 strings then you should use Cygwin and not MinGW (but the question is about MinGW). Otherwise you can write encoding-aware programs by using the linked library. – desio Jun 01 '21 at 17:05
  • Having built GCC myself for Windows from source I can assure you it does exist for Windows. MinGW is the system library it is built against (much like newlib for example on some other platforms). The topic of this question is why GCC itself doesn't support Unicode characters, not how to write a program that does. – Brecht Sanders Jun 01 '21 at 21:21