4

Possible Duplicate:
UTF8 to/from wide char conversion in STL

I know how to convert UTF8 to std::wstring using MultiByteToWideChar:

std::wstring utf8to16( const char* src )
{
    std::vector<wchar_t> buffer;
    buffer.resize(MultiByteToWideChar(CP_UTF8, 0, src, -1, 0, 0));
    MultiByteToWideChar(CP_UTF8, 0, src, -1, &buffer[0], buffer.size());
    return &buffer[0];
}

but it is Windows-specific, is there a cross-platform C++ function that does the same thing, using only stdio or iostream?

anthony sottile
  • 61,815
  • 15
  • 148
  • 207
sashoalm
  • 75,001
  • 122
  • 434
  • 781
  • 1
    I would suggest you look into something like [Boost locale](http://www.boost.org/libs/locale/). – Some programmer dude Jan 30 '13 at 10:19
  • I hope your code is just a simple sample code, not a production code. In fact, it doesn't check for errors from `MultiByteToWideChar()` calls. Moreover, you can use a `std::wstring` directly inside the function body, instead of allocating memory in a separate `std::vector` and then _deep-copy_ to a `std::wstring`. – Mr.C64 Jan 30 '13 at 10:57
  • Answers to http://stackoverflow.com/questions/7232710/convert-between-string-u16string-u32string show how you can do this using the std::wstring_convert class and std::codecvt locale facets – JoergB Jan 30 '13 at 11:50

1 Answers1

3

i suggest using utf8-cpp library it is simple and to the point when it comes to utf8 strings .

this code reads UTF-8 file and creates a utf16 version of each line, then converts back to utf-8

#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include "utf8.h"
using namespace std;
int main(int argc, char** argv)
{
    if (argc != 2) {
        cout << "\nUsage: docsample filename\n";
        return 0;
    }

    const char* test_file_path = argv[1];
    // Open the test file (contains UTF-8 encoded text)
    ifstream fs8(test_file_path);
    if (!fs8.is_open()) {
        cout << "Could not open " << test_file_path << endl;
        return 0;
    }

    string line;
    while (getline(fs8, line)) {

        // Convert the line to utf-16
        vector<unsigned short> utf16line;
        utf8::utf8to16(line.begin(), end_it, back_inserter(utf16line));

        // And back to utf-8
        string utf8line; 
        utf8::utf16to8(utf16line.begin(), utf16line.end(), back_inserter(utf8line));
    }
    return 0;
}
Max
  • 711
  • 1
  • 10
  • 27