_bstr_t to UTF-8 possible?

Question

I have a _bstr_t string which contains Japanese text. I want to convert this string to a UTF-8 string which is defined as a char *.

Can I convert the _bstr_t string to char * (UTF-8) string without losing the Japanese characters?

sharptooth · Answer 1 · 2009-03-10T14:51:33.383

16

Use WideCharToMultiByte() – pass CP_UTF8 as the first parameter.

Beware that BSTR can be a null pointer and that corresponds to an empty string – treat this as a special case.

edited Mar 10 '09 at 14:51

answered Mar 09 '09 at 12:43

sharptooth

167,383
100
513
979

cdiggins · Answer 2 · 2017-01-10T19:38:32.387

Here is some code that should do the conversion.

void PrintUtf8(const TCHAR* value) { 
    if (value == nullptr) {
        printf("");
        return;
    }
    int n = WideCharToMultiByte(CP_UTF8, 0, value, -1, nullptr, 0, nullptr, nullptr);
    if (n <= 0) {
        printf("");
        return;
    }
    char* buffer = new char[n];
    WideCharToMultiByte(CP_UTF8, 0, value, -1, buffer, n, nullptr, nullptr);
    printf("%s", buffer);
    delete(buffer);
}

score -1 · Answer 3 · answered Mar 09 '09 at 12:44

Very handy MSDN reference for this sort of thing: http://msdn.microsoft.com/en-us/library/ms235631(VS.80).aspx

I think you need to go to wchar_t* since char* will lose the Unicode stuff, although I'm not sure.

// convert_from_bstr_t.cpp
// compile with: /clr /link comsuppw.lib

#include <iostream>
#include <stdlib.h>
#include <string>

#include "atlbase.h"
#include "atlstr.h"
#include "comutil.h"

using namespace std;
using namespace System;

int main()
{
    _bstr_t orig("Hello, World!");
    wcout << orig << " (_bstr_t)" << endl;

    // Convert to a char*
    const size_t newsize = 100;
    char nstring[newsize];
    strcpy_s(nstring, (char *)orig);
    strcat_s(nstring, " (char *)");
    cout << nstring << endl;

    // Convert to a wchar_t*
    wchar_t wcstring[newsize];
    wcscpy_s(wcstring, (wchar_t *)orig);
    wcscat_s(wcstring, L" (wchar_t *)");
    wcout << wcstring << endl;

    // Convert to a CComBSTR
    CComBSTR ccombstr((char *)orig);
    if (ccombstr.Append(L" (CComBSTR)") == S_OK)
    {
        CW2A printstr(ccombstr);
        cout << printstr << endl;
    }

    // Convert to a CString
    CString cstring((char *)orig);
    cstring += " (CString)";
    cout << cstring << endl;

    // Convert to a basic_string
    string basicstring((char *)orig);
    basicstring += " (basic_string)";
    cout << basicstring << endl;

    // Convert to a System::String
    String ^systemstring = gcnew String((char *)orig);
    systemstring += " (System::String)";
    Console::WriteLine("{0}", systemstring);
    delete systemstring;
}

Thanks for your reply Nick. The problem is that I want to send this _bstr_t content via the Windows socket which allows only char* type to be sent (please check WSABUF structure in ws2def.h file). Now a wchat wont do. Is there a wide char version of _WSABUF structure? — Manav Sharma, Mar 09 '09 at 12:53
Windows Sockets don't care what data you send. In this case you can just reinterpret_cast to char* and be fine. — sharptooth, Mar 09 '09 at 13:00
Just don't mess up with the number of bytes - it's number of Unicode characters times sizeof(WCHAR) - and with null BSTRs. — sharptooth, Mar 09 '09 at 13:01
Although Windows Sockets don't care what data is sent, if the destination needs to understand the data and is using different byte-ordering, it is better to use UTF-8. Especially in mixed environment where systems with both byte-orderings are used. — Afriza N. Arief, Jan 24 '13 at 06:06

_bstr_t to UTF-8 possible?

3 Answers3

Linked