utf8 <-> utf16: codecvt poor performance

Question

I'm looking onto some of my old (and exclusively win32 oriented) stuff and thinking about making it more modern/portable - i.e. reimplementing some widely reusable parts in C++11. One of these parts is convertin between utf8 and utf16. In Win32 API I'm using MultiByteToWideChar/WideCharToMultiByte, trying to port that stuff to C++11 using sample code from here: https://stackoverflow.com/a/14809553. The result is

Release build (compiled by MSVS 2013, run on Core i7 3610QM)

stdlib                   = 1587.2 ms
Win32                    =  127.2 ms

Debug build

stdlib                   = 5733.8 ms
Win32                    =  127.2 ms

The question is - is there something wrong with the code? If everything seems to be OK - is there some good reason for the such performance difference?

Test code is below:

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <clocale>  
#include <codecvt> 

#define XU_BEGIN_TIMER(NAME)                       \
    {                                           \
        LARGE_INTEGER   __freq;                 \
        LARGE_INTEGER   __t0;                   \
        LARGE_INTEGER   __t1;                   \
        double          __tms;                  \
        const char*     __tname = NAME;         \
        char            __tbuf[0xff];           \
                                                \
        QueryPerformanceFrequency(&__freq);     \
        QueryPerformanceCounter(&__t0);         

#define XU_END_TIMER()                             \
        QueryPerformanceCounter(&__t1);         \
        __tms = (__t1.QuadPart - __t0.QuadPart) * 1000.0 / __freq.QuadPart; \
        sprintf_s(__tbuf, sizeof(__tbuf), "    %-24s = %6.1f ms\n", __tname, __tms ); \
        OutputDebugStringA(__tbuf);             \
        printf(__tbuf);                         \
    }   

std::string read_utf8() {
    std::ifstream infile("C:/temp/UTF-8-demo.txt");
    std::string fileData((std::istreambuf_iterator<char>(infile)),
                         std::istreambuf_iterator<char>());
    infile.close();

    return fileData;
}

void testMethod() {
    std::setlocale(LC_ALL, "en_US.UTF-8");
    std::string source = read_utf8();
    {
        std::string utf8;

        XU_BEGIN_TIMER("stdlib") {
            for( int i = 0; i < 1000; i++ ) {
                std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf16;
                std::u16string utf16 = convert2utf16.from_bytes(source);

                std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> convert2utf8;
                utf8 = convert2utf8.to_bytes(utf16);
            }
        } XU_END_TIMER();

        FILE* output = fopen("c:\\temp\\utf8-std.dat", "wb");
        fwrite(utf8.c_str(), 1, utf8.length(), output);
        fclose(output);
    }

    char* utf8 = NULL;
    int cchA = 0;

    {
        XU_BEGIN_TIMER("Win32") {
            for( int i = 0; i < 1000; i++ ) {
                WCHAR* utf16 = new WCHAR[source.length() + 1];
                int cchW;
                utf8 = new char[source.length() + 1];

                cchW = MultiByteToWideChar(
                    CP_UTF8, 0, source.c_str(), source.length(),
                    utf16, source.length() + 1);

                cchA = WideCharToMultiByte(
                    CP_UTF8, 0, utf16, cchW,
                    utf8, source.length() + 1, NULL, false);

                delete[] utf16;
                if( i != 999 )
                    delete[] utf8;
            }
        } XU_END_TIMER();

        FILE* output = fopen("c:\\temp\\utf8-win.dat", "wb");
        fwrite(utf8, 1, cchA, output);
        fclose(output);

        delete[] utf8;
    }
}

Your Win32 code is not allocating buffers correctly. UTF-8 and UTF-16 do not have a 1-to-1 relationship between their data lengths. You should be calling `MultiByteToWideChar`/`WideCharToMultiByte` one time to calculate the necessary buffer size, then allocate the buffer, then call again to do the actual conversion. So that affects timing a little bit. — Remy Lebeau, Oct 04 '14 at 20:13
Win32 since Vista uses SSE internally to great effect, something very few UTF transcoders do. It'll be hard to beat. — Cory Nelson, Oct 04 '14 at 20:15
@Remy Lebeau: yes, if I do NOT want to allocate extra (really temporary memory) I need to call MultiByteToWideChar/WideCharToMultiByte one more time - this will bring win32 usecase to something around 127*2 = 250ms. This this still 6.5 time quicker than stdlib. — Xtra Coder, Oct 04 '14 at 20:31
@CoryNelson: That's really interesting, do you have a link for that? — user541686, Oct 04 '14 at 22:09
Well, this is sad. All you can do is shame these guys into making it better. Do so by posting this at connect.microsoft.com — Hans Passant, Oct 04 '14 at 22:24
@Mehrdad no link. I was working on heavily optimizing my own UTF-8 decoder -- I had better perf than everything I tested against except for Windows which stayed about 2x faster depending on input. It drove me nuts so I decompiled their binaries to have a look. — Cory Nelson, Oct 04 '14 at 23:27
@CoryNelson, can you write down your comment as the separate answer so that I can accept it? I feel there is no better choice out there and ... I'll not switch to codecvt for now. — Xtra Coder, Oct 11 '14 at 20:46
thank you! I had the same thoughts exactly! I moved to std::codecvt in order to make my code standard and portable until I saw how POOR std::codecvt is! plus, win32 functions managed to convert some strings that std::codecvt simply threw "invalid encoding" exception.. — David Haim, Aug 24 '15 at 10:46

score 10 · Answer 1 · answered Aug 24 '15 at 10:42

10

In my own testing, I found that the constructor call for wstring_convert has a massive overhead, at least on Windows. As other answers suggest, you'll probably struggle to beat the native Windows implementation, but try modifying your code to construct the converter outside of the loop. I expect you'll see an improvement of between 5x and 20x, particularly in a debug build.

answered Aug 24 '15 at 10:42

James Davies

101
1
4

2

This turned out to be exactly the problem I was facing. Made the constructor static: boom! – Michael Grant May 18 '16 at 16:24
1

Now question is -- can you use that static object safely from multiple threads? ;) – C.M. Jan 25 '18 at 06:23
3

std::wstring_convert is not thread safe. You can use thread_local instead of static. – Yusuf Tarık Günaydın Aug 02 '18 at 15:42

score 5 · Accepted Answer · answered Oct 11 '14 at 22:20

Win32's UTF8 transcode since Vista uses SSE internally to great effect, something very few other UTF transcoders do. I suspect it will be impossible to beat with even the most highly optimized portable code.

However, this number you've given for codecvt is simply exceptionally slow if it's taking over 10x the time, and suggests a naive implementation. While writing my own UTF-8 decoder, I was able to reach within 2-3x the perf of Win32. There's a lot of room for improvement here, but you'd need to custom implement a codecvt to get it.

_Win32's UTF8 transcode since Vista uses SSE internally to great effect ..._ - do you have a reference? — polyvertex, Mar 18 '15 at 16:26

utf8 <-> utf16: codecvt poor performance

2 Answers2

Linked