94

A bit of foreground: my task required converting UTF-8 XML file to UTF-16 (with proper header, of course). And so I searched about usual ways of converting UTF-8 to UTF-16, and found out that one should use templates from <codecvt>.

But now when it is deprecated, I wonder what is the new common way of doing the same task?

(Don't mind using Boost at all, but other than that I prefer to stay as close to standard library as possible.)

login_not_failed
  • 1,121
  • 2
  • 11
  • 19

4 Answers4

30

Don't worry about that.

According to the same information source:

this library component should be retired to Annex D, along side , until a suitable replacement is standardized.

So, you can still use it until a new standardized, more-secure version is done.

plasmacel
  • 8,183
  • 7
  • 53
  • 101
xmllmx
  • 39,765
  • 26
  • 162
  • 323
  • 24
    Unfortunately, that was wishful thinking. Deprecation [was applied to C++17](https://isocpp.org/files/papers/p0636r0.html). The recommendation apparently is: *"Users should use dedicated text-processing libraries instead."* Visual Studio 2017 will issue deprecation warnings when used. – IInspectable Jul 23 '18 at 20:57
  • 10
    What dedicated text-processing library though? – camccar Aug 26 '19 at 00:17
  • 2
    Let's hope so since it is a bit too easy to just deprecate something without coming with an alternative. – gast128 Sep 12 '19 at 14:50
  • 5
    What good are "standards" if they are changed on a whim without giving a suitable replacement? Maybe the "standards" are not so standard after all. Don't the "standards" committees take into account the man hours wasted as a result of deprecation without suitable replacement? – rxantos Mar 10 '22 at 20:12
  • @camccar ICU : https://icu.unicode.org/ – rahman Jan 10 '23 at 23:47
  • Recommend using this package: https://github.com/nemtrif/utfcpp – vy32 Apr 15 '23 at 23:48
29

std::codecvt template from <locale> itself isn't deprecated. For UTF-8 to UTF-16, there is still std::codecvt<char16_t, char, std::mbstate_t> specialization.

However, since std::wstring_convert and std::wbuffer_convert are deprecated along with the standard conversion facets, there isn't any easy way to convert strings using the facets.

So, as Bolas already answered: Implement it yourself (or you can use a third party library, as always) or keep using the deprecated API.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 11
    But according to [P0618](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html), *all* of the header is deprecated. Not just the typedefs; `std::codecvt` is deprecated in its entirety. – Nicol Bolas Mar 28 '17 at 14:18
  • 4
    @NicolBolas The proposal appears to not suggest any changes to `[locale.codecvt]` where codecvt_base and codecvt of the `` header are defined. However, reading the document, I can see that w{string,buffer}_convert are deprecated as well, which as far as I know are the only standard function that actually uses the codecvt facet. So, even if codecvt isn't deprecated, there isn't really any easy way to use them. Do you think that the omission of `std::codecvt` form the document is accidental? – eerorika Mar 28 '17 at 14:46
  • 5
    @user2079303 `basic_filebuf` uses it. – T.C. Mar 28 '17 at 16:46
  • 1
    [P0618](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0618r0.html) is just a proposal from one member of the standards committee, albeit an influential one. It says nothing about whether the proposal has been accepted; if it has been, it won't be deprecated until the next standard (probably in the mid-2020s) and likely won't be removed until about 2030. I imagine compilers will continue to support it for a while longer. – Richard Smith Oct 22 '17 at 17:53
  • 9
    @RichardSmith according to [P0636R0](https://isocpp.org/files/papers/p0636r0.html), P0618R0 was applied to C++17, which means that the depreciations are in effect since that standard revision. – eerorika Oct 22 '17 at 18:22
  • As far as I can tell, the `wX_convert` tools are deprecated because they relied on the deprecated specialisations of `std::codecvt`, rather than the other way around. As such, if you are already using those specialisations, it _probably_ wouldn't cause any issues to use the converters along with them. – Justin Time - Reinstate Monica Jul 03 '19 at 22:58
  • Recommend using this package: https://github.com/nemtrif/utfcpp – vy32 Apr 15 '23 at 23:48
17

Since nobody really answers the question and provides usable replacement code, here is one but it's only for Windows:

#include <string>
#include <stdexcept>
#include <Windows.h>

std::wstring string_to_wide_string(const std::string& string)
{
    if (string.empty())
    {
        return L"";
    }

    const auto size_needed = MultiByteToWideChar(CP_UTF8, 0, &string.at(0), (int)string.size(), nullptr, 0);
    if (size_needed <= 0)
    {
        throw std::runtime_error("MultiByteToWideChar() failed: " + std::to_string(size_needed));
    }

    std::wstring result(size_needed, 0);
    MultiByteToWideChar(CP_UTF8, 0, &string.at(0), (int)string.size(), &result.at(0), size_needed);
    return result;
}

std::string wide_string_to_string(const std::wstring& wide_string)
{
    if (wide_string.empty())
    {
        return "";
    }

    const auto size_needed = WideCharToMultiByte(CP_UTF8, 0, &wide_string.at(0), (int)wide_string.size(), nullptr, 0, nullptr, nullptr);
    if (size_needed <= 0)
    {
        throw std::runtime_error("WideCharToMultiByte() failed: " + std::to_string(size_needed));
    }

    std::string result(size_needed, 0);
    WideCharToMultiByte(CP_UTF8, 0, &wide_string.at(0), (int)wide_string.size(), &result.at(0), size_needed, nullptr, nullptr);
    return result;
}
BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185
8

The new way is... you write it yourself. Or just rely on deprecated functionality. Hopefully, the standards committee won't actually remove codecvt until there is a functioning replacement.

But at present, there isn't one.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 15
    It's the problem: I need the most portable way of doing this. Of course there are always things like icu, iconv and various other libs, but there was a fairly straight-forward way before, which involved three lines of code, and now it's a pure mess. – login_not_failed Mar 28 '17 at 15:58
  • 6
    @login_not_failed not "was", it still is, since it's not removed (and not going to be removed for a while) – Cubbi Mar 29 '17 at 03:14
  • 2
    Written C++ for a long time, then try Rust, then comeback to work on some C++ projects; I admitted that Rust is a lot better. I don't understand why they deprecated this functionality without providing a replacement. – UltimaWeapon Aug 20 '22 at 06:52
  • @UltimaWeapon: Deprecation does not mean *removal*. You're *supposed* to deprecate bad APIs that you intend to replace, even if you don't have a replacement right now. That's what deprecation is *for*. – Nicol Bolas Aug 20 '22 at 13:19
  • https://github.com/nemtrif/utfcpp – vy32 Apr 15 '23 at 23:48