0

I had the code

#include <string>
#include <fstream>
int main()
{
    std::string BOM = u8"\uFEFF";  // C2440
    std::ofstream f("utf8.txt");
    f << BOM;
}

which worked in C++14 and C++17.

Switching to C++20 now, I get the compiler error C2440. UTF-8 string literals seem to have undergone a breaking change.

The MSDN article suggests a reinterpret_cast, but I can't do that for a string. (And I really doubt that this cast would be a good idea).

std::string BOM = reinterpret_cast<std::string> (u8"\uFEFF");

How do I make my code work without fiddling with the UTF-8 BOM Bytes like 0xEF 0xBB 0xBF? I somehow liked how \uFEFF worked nicely with all sorts of encodings.

I also tried:

#include <string>
#include <fstream>
int main()
{
    std::u8string BOM = u8"\uFEFF";  // ok now
    std::ofstream f("utf8.txt");
    f << BOM;                        // C2679, no matching << operator
}

in which case the u8string line is fine, but the stream output is not.

Thomas Weller
  • 55,411
  • 20
  • 125
  • 222

1 Answers1

0

This accepted answer mentioned by @Bob_ will do for my uses.

So my code looks like

#include <string>
#include <fstream>

std::string from_u8string(std::u8string const& s) {
    return { s.begin(), s.end() };
}

int main()
{
    std::string const BOM{ from_u8string(u8"\uFEFF") };
    std::ofstream f("utf8.txt");
    f << BOM << "你好";
}
Thomas Weller
  • 55,411
  • 20
  • 125
  • 222