3

Copying a string with no encodage into a c-string is quite easy:

auto to_c_str(std::string const& str) -> char* {
    auto dest = new char[str.size() + 1];
    return strcpy(dest, str.c_str());
}

But how can I do that with a std::u8string? Is there a STL algorithm that can help with that?

I tried this:

auto to_c_str(std::u8string const& str) -> char8_t* {
    auto dest = new char8_t[str.size() + 1];
    return std::strcpy(dest, str.c_str());
}

But of course, std::strcpy is not overloaded for utf8 strings.

Guillaume Racicot
  • 39,621
  • 9
  • 77
  • 141
  • 2
    The way UTF-8 is defined, I don't see any problem with what you have. Might as well just use `memcpy` for performance. Or am I missing the point? – DeiDei Jul 02 '19 at 20:06

3 Answers3

10

strcpy isn't needed since you already know the length of what you'd like to copy, so use memcpy:

char8_t* to_c_str(std::u8string const& str) {
    auto dest = new char8_t[str.size() + 1];
    return static_cast<char8_t*>(std::memcpy(dest, str.data(), str.size()+1));
}

or std::copy:

char8_t* to_c_str(std::u8string const& str) {
    auto dest = new char8_t[str.size() + 1];
    std::copy(str.data(), str.data() + str.size() + 1, dest);
    return dest;
}

Since the u8string's own copy() method can't be used to include the null-terminator directly, I'd not use it when copying to a raw char8_t*.

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
3

In addtion to using std::memcpy, you may use std::u8string::copy and std::copy.

auto to_c_str(std::u8string const& str) -> char8_t* {
    auto dest = new char8_t[str.size() + 1];
    str.copy(dest, str.size(), 0);
    dest[str.size()] = u8'\0';
    return dest;
}

auto to_c_str(std::u8string const& str) -> char8_t* {
    auto dest = new char8_t[str.size() + 1];
    std::copy(str.begin(), str.end(), dest);
    dest[str.size()] = u8'\0';
    return dest;
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270
1

It seems to me like it would be easier to simply leverage the built-in copying and provide .data() to the C code:

std::u8string orig = u8"abc";
auto copy = orig;
c_api(copy.data(), copy.size());

By doing this, you let the copied string manage its own lifetime and have the size on equal footing with the data. This works uniformly for any char type of std::basic_string. As an added bonus, it also works for std::vector.

chris
  • 60,560
  • 13
  • 143
  • 205