2

I'm developing a game for windows for learning purposes (I'm learning DirectX). I would like it to have UTF support.

Reading this question I learned that windows uses wchar_t, which is UTF-16. I want my game to have Lua scripting support, and Lua doesn't really like Unicode much.. It simply treats strings as a "stream of bytes"; this works well enough for UTF-8, but UTF-16 would be virtually impossible to use.

Long story short: windows wants UTF-16, lua wants UTF-8.

So I thought, let's just use UTF-8 with normal char* and string! .length() will be messed up but who cares? However it doesn't work:

const char test_utf8[] = { 111, 108, 0xc3, 0xa9, 0 }; // UTF-8 for olè
mFont->DrawTextA(0, test_utf8, -1, &R, DT_NOCLIP, BLACK);
    /* DrawText is a Direct3d function to, well, draw text.
     * It's like MessageBox: it is a define to either DrawTextA
     * or DrawTextW, depending if unicode is defined or not. Here
     * we will use DrawTextA, since we are passing a normal char*. */

This prints olé. In other words it doesn't appear to use UTF-8 but rather ISO-8859-1.

So, what can I do? I can think of the following:

  1. Abandon the idea of UTF; use ISO-8859-1 and be happy (this is what World of Warcraft does, at least for the enUS version)
  2. Convert every single string at every single frame from UTF-8 to UTF-16 (I'm worried about performance issues, though, considering it will do this 60+ times a second for each string and it's O(N) I'm pretty sure it will be fairly slow)
  3. For each lua string keep an UTF-16 copy; huge waste of memory, very difficult to implement (keeping the UTF-16 strings up to date when they change in Lua, etc)
Community
  • 1
  • 1
Andreas Bonini
  • 44,018
  • 30
  • 122
  • 156
  • You need to convert the UTF-8 char* (or std::string) into wide string before using it with Windows APIs. Fortunately the API contains a function for this, but I cannot remember what it is called. – Tronic Feb 26 '10 at 22:14
  • 1
    Why can't you keep all of your strings in UTF-16 format? You want to keep strings out of your code in resources anyway, right? – John Knoeller Feb 26 '10 at 22:20
  • @John: these are probably strings being generated within Lua scripts (and thus beyond the control of the base program). – Amber Feb 26 '10 at 22:34

4 Answers4

5

It doesn't use 8859-1 either, it uses your system's local code page. You can convert to UTF16 and use DrawText() by converting the string yourself. If your class library doesn't have any support then you can use MultiByteToWideChar().

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
2

I wouldn't be shocked is WoW doesn't use DirectX text draw methods. Having your own custom text draw solution gives you a lot more flexibility in your support for encodings. It isn't too hard.

Torlack
  • 4,395
  • 1
  • 23
  • 24
  • That's actually not a bad idea. Do you have any links to papers/tutorials/etc that explain how to do this? (PS ran out of votes, will upvote in 60 minutes) – Andreas Bonini Feb 26 '10 at 23:01
2

You can get lua to cache your conversions to UTF-16

utf16 = setmetatable ( {} , { __index = function ( t , k , v )
        local utf16str = my_conversion_func_to_utf16 ( v )
        rawset ( t , k , utf16str )
        return utf16str
    end } )

then just have all your functions only take the utf16 string types (which could be a lua string or some sort of userdata (which could be your wchar_t array))

I can help more if you don't understand...

daurnimator
  • 4,091
  • 18
  • 34
1

한국어/ì¡°ì„ ë§. I see this all the time in StarCraft because it doesn't have proper support for Unicode.

Fight the good fight! Use UTF-8. Convert to UTF-16 every frame (unless there's a better way to do it mentioned in the docs which I'm too lazy to look at). Don't worry about performance here until it becomes a problem!

Joey Adams
  • 41,996
  • 18
  • 86
  • 115