I can use ofstream to write to UTF-8 BOM file. I can also write Unicode string to file using wofstream and imbue with utf8_locale
(codecvt_utf8
). However, I cannot find out how to write Unicode string to file with UTF-8 BOM encoding.
Asked
Active
Viewed 9,690 times
3

Ajay
- 18,086
- 12
- 59
- 105

Alex Huynh
- 384
- 3
- 11
-
1`utf-8` does not need `BOM`. – axiac Jun 02 '16 at 12:14
-
3@axiac: it doesn't need it, but it can help. In an ideal world all text would be accompanied by a MIME-type. Since this is not an ideal world, a BOM in a UTF-8 file helps software guess the encoding. – Steve Jessop Jun 02 '16 at 12:50
-
https://stackoverflow.com/a/15914558/1599699 – Andrew Oct 31 '20 at 03:35
2 Answers
3
BOM is just first optional bytes at the beginning of the file to specify its encoding. it has nothing to do directly to std::fstream
as fstream
is just a file stream for reading and writing random bytes/characters.
you just need to manually write the BOM before you continue writing your utf8 encoded string.
unsigned uint8_t utf8BOM[] = {0xEF,0xBB,0xBF};
fileStream.write(utf8BOM,sizeof(utf8BOM));
//write the rest of the utf8 encoded string..

David Haim
- 25,446
- 3
- 44
- 78
-
1Or if you're using a wide stream with the locale doing the UTF-8 encoding then it's just character `U+FEFF` – Steve Jessop Jun 02 '16 at 12:11
-
-
3@Dieter that's the byte sequence. The unicode code point is (regardless of endianness) `U+FEFF` – rubenvb Jun 02 '16 at 12:22
-
fstream can write BOM to file but cannot write unicode string (e.g. "日本医療政策機構" or "Phở") as I mentioned in my question. – Alex Huynh Jun 03 '16 at 01:35
-
1FYI: you can also get the UTF-8 BOM with C++11 compilers by using `const char utf8Bom[] = u8"\uFEFF"` – Nicol Bolas Jun 03 '16 at 04:52
-
To address @AlexHuynh's point as having the same problem, following from SteveJessop and rubenvb, when opening a std::wofstream ofs, I achieved success with "ofs << L"\FEFF";". – David Carr Jan 08 '23 at 01:29
3
The example below works fine in VS 2015 or new gcc compilers:
#include <iostream>
#include <string>
#include <fstream>
#include <codecvt>
int main()
{
std::string utf8 = u8"日本医療政策機構\nPhở\n";
std::ofstream f("c:\\test\\ut8.txt");
unsigned char bom[] = { 0xEF,0xBB,0xBF };
f.write((char*)bom, sizeof(bom));
f << utf8;
return 0;
}
In older versions of Visual Studio you have to declare UTF16 string (with L
prefix), then convert from UTF16 to UTF8:
#include <iostream>
#include <string>
#include <fstream>
#include <Windows.h>
std::string get_utf8(const std::wstring &wstr)
{
if (wstr.empty()) return std::string();
int sz = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), 0, 0, 0, 0);
std::string res(sz, 0);
WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &res[0], sz, 0, 0);
return res;
}
std::wstring get_utf16(const std::string &str)
{
if (str.empty()) return std::wstring();
int sz = MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), 0, 0);
std::wstring res(sz, 0);
MultiByteToWideChar(CP_UTF8, 0, &str[0], (int)str.size(), &res[0], sz);
return res;
}
int main()
{
std::string utf8 = get_utf8(L"日本医療政策機構\nPhở\n");
std::ofstream f("c:\\test\\ut8.txt");
unsigned char bom[] = { 0xEF,0xBB,0xBF };
f.write((char*)bom, sizeof(bom));
f << utf8;
return 0;
}

Barmak Shemirani
- 30,904
- 6
- 40
- 77
-
Thanks Barmak. I am using Visual Studio 2013 and get error in "u8" literal because VS2013 cannot understand it. I know it worked on VS2015 but I want to do it on VS2013. – Alex Huynh Jun 03 '16 at 04:34
-
I don't remember VS2013 capabilities. See the updated code, it should work for older compilers. – Barmak Shemirani Jun 03 '16 at 04:35