I am using C++ on Windows. I have some data in a std::string
that I want to write to a file with UTF-8 encoding. How do I do this?

- 328,167
- 45
- 605
- 847

- 21
- 2
-
What have you tried? All you need is basically `file << string;` – super Apr 29 '21 at 13:33
-
Do you need a BOM ( https://en.wikipedia.org/wiki/Byte_order_mark ) at the beginning of the file ? – Richard Critten Apr 29 '21 at 13:34
-
Pretty sure that for a UTF-8 `ofstream` you can do `std::basic_ofstream
` if that's what you're asking. – mediocrevegetable1 Apr 29 '21 at 13:40 -
I have tried file << string. But when I check the encoding of the created file in notepad, it is ANSI and not UTF-8. – Vikas Kakkar Apr 29 '21 at 13:44
-
@VikasKakkar The encoding of NotePad is the encoding it uses to interpret the data contained in your file (and to display it). It doesn't tell what encoding was used to generate the file. Basically, encoding is just a convention (at a semantic level), but in reality, your file just contains bytes ^^ – Fareanor Apr 29 '21 at 13:51
-
If the only thing in the file is ASCII text, your notepad will tell you that it contains ASCII text. ASCII text that's formally UTF-8 encoded is identical to ASCII. There is no label attached to the file that says what encoding is used for it. notepad attempts to ***heuristically*** detect the file's encoding. – Sam Varshavchik Apr 29 '21 at 13:52
-
If the `std::string` is already UTF-8 encoded, just write its content as-is to the file. I would open the `ofstream` in `binary` mode and use `write()` instead of `operator<<`, though. If the string is not already in UTF-8, you will have to convert it first, such as with `WideCharToMultiByte(CP_UTF8)` or equivalent. – Remy Lebeau Apr 29 '21 at 14:12
-
_"...Check the encoding of the created file in notepad..."_ put a BOM (see above comment) on the front of the file, Notepad will trust the BOM and use the right encoding. – Richard Critten Apr 29 '21 at 14:57
2 Answers
I have some data in a std::string that I want to write to a file with UTF-8 encoding. How do I do this?
If the string contains the text in UTF-8 encoding, then simply write the data. You can use std::ofstream
for example.
If the string doesn't contain the data in UTF-8, then before writing, you must first convert from the encoding that the data is currently in. C++ standard library doesn't have general character encoding conversion functions (disregarding a few that are deprecated). There's generally no guaranteed way to detect the current encoding. You should simply know it beforehand.
But when I check the encoding of the created file in notepad, it is ANSI and not UTF-8
Like I mentioned in previous section regarding detecting the source encoding of the string, there is no guaranteed way to do that. Notepad also doesn't have this superpower. It probably uses simplistic rules to guess the encoding. Sometimes it guesses wrong.
UTF-8 has the same representation for the characters in the 7 bit ASCII encoding as the ASCII itself (I'm guessing that notepad calls ASCII by the name "ANSI"). If your string contains only those characters, then the UTF-8 encoding of the string is indistinguishable from ASCII. In such case, notepad is likely going to guess wrong (although technically the guess is also correct since the UTF-8 would in that case incidentally be ASCII as well).

- 232,697
- 12
- 197
- 326
-
"*C++ standard library doesn't have general character encoding conversion functions*" - actually it does have a few, but they are not very good. And the one that would actually be useful here - `std::wstring_convert` with `std::codecvt_utf8/_utf16` - is deprecated with no replacement in sight yet. – Remy Lebeau Apr 29 '21 at 14:16
-
@RemyLebeau Why would `std::codecvt_utf8/_utf16` or `std::wstring_convert` be useful in converting some narrow encoding stored in `std::string` into another narrow encoding (specifically UTF-8). Neither of them is UTF-16. – eerorika Apr 29 '21 at 14:32
-
1a narrow-to-narrow conversion requires an intermediate conversion to Unicode/UTF-16, so narrow->Unicode/UTF16->narrow/UTF8. `wstring_convert`/`codecvt is useful for that 2nd step, at least. – Remy Lebeau Apr 29 '21 at 17:54
This is similar to How do I write a UTF-8 encoded string to a file in windows, in C++.
Note that writing to file across platforms is different (in windows you have CreateFile, WriteFile, ReadFile, CloseHandle, which is not limited to files only and can perform operation on Device-Drivers), were in linux you have different sets of fuctions. It's best to check the platform you're intending to use (in your case, Windows).

- 80
- 3
-
2Um, yes, there are platform-specific ways of managing files. But the C++ standard library has code for managing files that masks those differences so you don't have to write different code for different platforms. – Pete Becker Apr 29 '21 at 14:08