11

I want to convert wstring to UTF-8 Encoding, but I want to use built-in functions of Linux.

Is there any built-in function that convert wstring or wchar_t* to UTF-8 in Linux with simple invokation?

Example:

wstring str = L"file_name.txt";
wstring mode = "a";
fopen([FUNCTION](str), [FUNCTION](mode)); // Simple invoke.
cout << [FUNCTION](str); // Simple invoke.
Quonux
  • 2,975
  • 1
  • 24
  • 32
Amir Saniyan
  • 13,014
  • 20
  • 92
  • 137

4 Answers4

17

If/when your compiler supports enough of C++11, you could use wstring_convert

#include <iostream>
#include <codecvt>
#include <locale>
int main()
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> utf8_conv;
    std::wstring str = L"file_name.txt";
    std::cout << utf8_conv.to_bytes(str) << '\n';
}

tested with clang++ 2.9/libc++ on Linux and Visual Studio 2010 on Windows.

Cubbi
  • 46,567
  • 13
  • 103
  • 169
  • std::wbuffer_convert, std::wstring_convert, and the header (containing std::codecvt_mode, std::codecvt_utf8, std::codecvt_utf16, and std::codecvt_utf8_utf16) are deprecated in C++17. (The std::codecvt class template is NOT deprecated.) – A.Danesh Apr 12 '21 at 11:49
  • 1
    @A.Danesh it was an aspirational deprecation, like with strstreams that were deprecated in C++98, but are still a mandatory part of C++20 – Cubbi Apr 12 '21 at 14:18
8

The C++ language standard has no notion of explicit encodings. It only contains an opaque notion of a "system encoding", for which wchar_t is a "sufficiently large" type.

To convert from the opaque system encoding to an explicit external encoding, you must use an external library. The library of choice would be iconv() (from WCHAR_T to UTF-8), which is part of Posix and available on many platforms, although on Windows the WideCharToMultibyte functions is guaranteed to produce UTF8.

C++11 adds new UTF8 literals in the form of std::string s = u8"Hello World: \U0010FFFF";. Those are already in UTF8, but they cannot interface with the opaque wstring other than through the way I described.

See this question for a bit more background.

Community
  • 1
  • 1
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 1
    C++11's utf-8 strings can interface with wstrings through `wstring_convert` – Cubbi Sep 26 '11 at 22:16
  • @Cubbi: I remain unconvinced that that has anything to do with UTF8. It looks like a mere wrapper for `wcstombs`. (There's a header `` that looks more promising.) – Kerrek SB Sep 26 '11 at 23:22
  • 1
    `wstring_convert` is not related to `wcstombs`. It's a wrapper for codecvt facets, such as `codecvt_utf8`. – Cubbi Sep 27 '11 at 02:05
  • @Kerreck SB: I think I see your point: except for the scant functions there is no portable connection between the C++03's generic narrow-multibyte/wide conversions and C++11's explicit UTF-8/UTF-16/UTF-16le/UCS2/UTF-32/UCS4 conversions. Interesting observation. – Cubbi Sep 28 '11 at 23:18
1

It's quite plausible that wcstombs will do what you need if what you actually want to do is convert from wide characters to the current locale.

If not then you probably need to look to ICU, boost or similar.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
-1

Certainly there is no function built in on Linux, because the name Linux references the kernel only, which doesn't have anything to with it. I seriously doubt that the libc that comes with gcc has such a function, and

$ man -k utf

supports this theory. But there are plenty of good UTF-8 libraries around. I personally recommend the iconv library for such conversions.

thiton
  • 35,651
  • 4
  • 70
  • 100
  • 1
    your man search lies to you: Linux glibc has an iconv implementation: http://www.gnu.org/s/hello/manual/libc/glibc-iconv-Implementation.html – rubenvb Sep 19 '11 at 10:28