I'm currently learning C and lately, I have been focusing on the topic of character encoding. Note that I'm a Windows programmer. While I currently test my code only on Windows, I want to eventually port it to Linux and macOS, so I'm trying to learn the best practices right now.
In the example below, I store a file path in a wchar_t
variable to be opened later on with _wfopen
. I need to use _wfopen
because my file path may contain chars not in my default codepage. Afterwards, the file path and a text literal is stored inside a char
variable named message
for further use. My understanding is that you can store a wide string into a multibyte string with the %ls
modifier.
char message[8094] = "";
wchar_t file_path[4096] = L"C:\\test\\test.html";
sprintf(message, "Accessing: %ls\n", file_path);
While the code works, GCC/MinGW outputs the following warning and notes:
warning: '%ls' directive writing up to 49146 bytes into a region of size 8083 [-Wformat-overflow=]|
note: assuming directive output of 16382 bytes|
note: 'sprintf' output between 13 and 49159 bytes into a destination of size 8094|
My issue is that I simply do not understand how sprintf
could output up to 49159 bytes into the message
variable. I output the Accessing:
string literal, the file_path
variable, the \n
char and the \0
char. What else is there to output?
Sure, I could declare message
as a wchar_t
variable and use wsprintf
instead of sprintf
, but my understanding is that wchar_t
does not make up for nice portable code. As such, I'm trying to avoid using it unless it's required by a specific API.
So, what am I missing?