9

I have a _bstr_t string which contains Japanese text. I want to convert this string to a UTF-8 string which is defined as a char *.

Can I convert the _bstr_t string to char * (UTF-8) string without losing the Japanese characters?

JoeG
  • 12,994
  • 1
  • 38
  • 63
Manav Sharma
  • 1,053
  • 1
  • 13
  • 21

3 Answers3

16

Use WideCharToMultiByte() – pass CP_UTF8 as the first parameter.

Beware that BSTR can be a null pointer and that corresponds to an empty string – treat this as a special case.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
1

Here is some code that should do the conversion.

void PrintUtf8(const TCHAR* value) { 
    if (value == nullptr) {
        printf("");
        return;
    }
    int n = WideCharToMultiByte(CP_UTF8, 0, value, -1, nullptr, 0, nullptr, nullptr);
    if (n <= 0) {
        printf("");
        return;
    }
    char* buffer = new char[n];
    WideCharToMultiByte(CP_UTF8, 0, value, -1, buffer, n, nullptr, nullptr);
    printf("%s", buffer);
    delete(buffer);
}
cdiggins
  • 17,602
  • 7
  • 105
  • 102
-1

Very handy MSDN reference for this sort of thing: http://msdn.microsoft.com/en-us/library/ms235631(VS.80).aspx

I think you need to go to wchar_t* since char* will lose the Unicode stuff, although I'm not sure.

// convert_from_bstr_t.cpp
// compile with: /clr /link comsuppw.lib

#include <iostream>
#include <stdlib.h>
#include <string>

#include "atlbase.h"
#include "atlstr.h"
#include "comutil.h"

using namespace std;
using namespace System;

int main()
{
    _bstr_t orig("Hello, World!");
    wcout << orig << " (_bstr_t)" << endl;

    // Convert to a char*
    const size_t newsize = 100;
    char nstring[newsize];
    strcpy_s(nstring, (char *)orig);
    strcat_s(nstring, " (char *)");
    cout << nstring << endl;

    // Convert to a wchar_t*
    wchar_t wcstring[newsize];
    wcscpy_s(wcstring, (wchar_t *)orig);
    wcscat_s(wcstring, L" (wchar_t *)");
    wcout << wcstring << endl;

    // Convert to a CComBSTR
    CComBSTR ccombstr((char *)orig);
    if (ccombstr.Append(L" (CComBSTR)") == S_OK)
    {
        CW2A printstr(ccombstr);
        cout << printstr << endl;
    }

    // Convert to a CString
    CString cstring((char *)orig);
    cstring += " (CString)";
    cout << cstring << endl;

    // Convert to a basic_string
    string basicstring((char *)orig);
    basicstring += " (basic_string)";
    cout << basicstring << endl;

    // Convert to a System::String
    String ^systemstring = gcnew String((char *)orig);
    systemstring += " (System::String)";
    Console::WriteLine("{0}", systemstring);
    delete systemstring;
}
Nick
  • 13,238
  • 17
  • 64
  • 100
  • Thanks for your reply Nick. The problem is that I want to send this _bstr_t content via the Windows socket which allows only char* type to be sent (please check WSABUF structure in ws2def.h file). Now a wchat wont do. Is there a wide char version of _WSABUF structure? – Manav Sharma Mar 09 '09 at 12:53
  • 2
    Windows Sockets don't care what data you send. In this case you can just reinterpret_cast to char* and be fine. – sharptooth Mar 09 '09 at 13:00
  • Just don't mess up with the number of bytes - it's number of Unicode characters times sizeof(WCHAR) - and with null BSTRs. – sharptooth Mar 09 '09 at 13:01
  • Although Windows Sockets don't care what data is sent, if the destination needs to understand the data and is using different byte-ordering, it is better to use UTF-8. Especially in mixed environment where systems with both byte-orderings are used. – Afriza N. Arief Jan 24 '13 at 06:06