25

Is it possible to use Windows API ANSI functions with UTF-8 strings?

For example, say I have a path encoded in UTF-8. Can I call CreateDirectoryA or CreateFileA and use a UTF-8 path, or do I have to perform some conversion before calling the functions?

James Ko
  • 32,215
  • 30
  • 128
  • 239
krebstar
  • 3,956
  • 8
  • 46
  • 64
  • 2
    Yikes. Why would anyone want that? I think we're way past Windows ME now (which was the last Windows version ever to need the ANSI APIs). They should die out already, especially for newly-developed applications. – Joey Jan 12 '12 at 07:22
  • 2
    From where are you obtaining UTF-8 strings? It's much easier to convert your application to work entirely with UTF-16 strings, as the so-called wide-versions Windows API functions require. And as Joey says, *always* call the wide versions (with the `W` suffix), not the ANSI versions. Those have been obsolete for decades. – Cody Gray - on strike Jan 12 '12 at 11:27
  • 13
    @Joey: Because an awful lot of C(++) libraries (including the standard library!) prefer to work with `char`-based strings rather than `wchar_t`-based strings. If Windows fully supported UTF-8, then you could just use UTF-8 throughout your program instead of having to convert between UTF-8 and UTF-16 all the time. – dan04 Jan 12 '12 at 16:39
  • 1
    @dan04: UTF-16 is the best Unicode encoding for processing (UTF-8 is OK for storage), see this interesting article: http://unicode.org/notes/tn12/ (note also that both C# and Java use UTF-16 encoding for their string classes). –  Jan 14 '12 at 22:22
  • 8
    @user1149224 UTF-16-processing code is no less complex than UTF-8-processing code. UTF-32-processing code is much simpler. – user253751 Apr 14 '14 at 06:48

3 Answers3

17

No. Use MultiByteToWideChar to convert UTF-8 to UTF-16 and then call the wide character APIs such as CreateDirectoryW or CreateFileW.

Kotori0
  • 135
  • 2
  • 10
casablanca
  • 69,683
  • 7
  • 133
  • 150
  • 13
    I would also add that since Windows uses UTF-16 exclusively, it might be best for you to follow suit and work with UTF-16 for the most part, and only do the conversion to UTF-8 when you need to read/write from external sources. – casablanca Jan 12 '12 at 07:09
  • 9
    @casablanca: Another approach that's been advocated is to use UTF-8 for the most part and convert to and from UTF-16 only when talking to the Windows interface. – Keith Thompson Aug 14 '14 at 15:54
  • @casablanca that will cause some serious headaches with the C++ standard library unfortunately, stuff like exception messages is hard coded to be char, not wchar_t. There are some people who suggest not putting unicode in exception messages, but this is not very practical, because if you need to communicate something like "Cannot open file 바위처럼 단단한.txt" or "Record with name 바위처럼 단단한 does not exist" in an exception you won't be able to easily do it. Saying "exceptions don't need unicode" really means "your whole codebase uses unicode only for display purposes". – jrh Apr 02 '21 at 14:24
  • 3
    This answer is out of date, please refer to the updated answer below. – Erik Mar 18 '22 at 21:45
15

The accepted answer is no longer correct (as of Windows Version 1903 (May 2019 Update)).

An application can now set the active code page of the process to UTF-8. This allows ...A functions (and CP_ACP) to work with UTF-8. A manifest to do that looks like this

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>

Source and additional information: Use the Windows UTF-8 code page

dialer
  • 4,348
  • 6
  • 33
  • 56
3

An easier approach (than using raw Win32 API MultiByteToWideChar) would be to use ATL conversion helpers, like CA2CW. You can specify CP_UTF8 as code page (second parameter in the constructor), to convert from Unicode UTF-8 to Unicode UTF-16:

CreateDirectoryW( 
  CA2W( utf8Name, CP_UTF8 ) // convert from UTF-8 to UTF-16
  ... // other stuff
);

Note that in Unicode builds (which should be the default ones these days), CreateDirectory just expands to CreateDirectoryW, so I would just drop the ending "W" and use the (IMHO, more readable) CreateDirectory:

CreateDirectory( 
  CA2W( utf8Name, CP_UTF8 ) // convert from UTF-8 to UTF-16
  ... // other stuff
);
Simon Mourier
  • 132,049
  • 21
  • 248
  • 298