Difference and conversions between wchar_t for Linux and for Windows

Question

I understand from this and this thread that in Windows, wchar_t is 16-bit & for Linux, wchar_t is 32 bit.

I have a client-server architecture (using just pipes - not sockets)- where my server is Windows based and client is Linux.

Server has a API to retrieve hostname from client. When the client is Windows based, it could just do GetComputerNameW and return Wide-String. However, when the client is Linux based, things get messy.

As a first naive approach, I used mbstowcs() hoping to return wchar_t* to Windows server-side. However, this LPWSTR (I have typedef wchar_t* LPWSTR on my linux clinet side) is not recognizable on Windows since it expects its wchar_t to be 16-bit.

So, converting the output of gethostname() on linux - which is in char* to unsigned short (16-bit) my only option?

Thanks in Advance!

use libicu. You will need to convert from UCS-2 windows to UCS-4 Linux, UTF-8 linux file system, etc. — joy, Nov 27 '12 at 20:54
How about trying [UTF-8 Everywhere](http://utf8everywhere.org/)? Your linux code will be clean, and the windows part will be easily interoperable with the rest of your application. — Yakov Galka, Nov 27 '12 at 21:01

score 6 · Answer 1 · answered Nov 27 '12 at 20:48

You will have to decide on the actual protocol on how to transport the data across the wire. Several options here although probably UTF-8 is usually the most sensible one - also that means that under linux you can basically just use the data as-is (no reason to use wchar_t to begin with, although you obviously can convert it into whatever you want).

Under Windows you will have to convert the UTF-8 into UTF-16 (yes not exactly, but oh well) which windows wants and if you want to send data you have to convert it to UTF-8. Luckily windows provides this respectively this function for exactly these purposes.

Obviously you can decide on any encoding you want to not necessarily UTF-8, the process is the same: When receiving data convert it to the native format of the OS, when sending convert it to your on-wire encoding. iconv works on linux if you don't use utf-8.

@neagoegab That pretty much depends on which graphics library you're actually using doesn't it? But QT for example allows users to use UTF-8 just fine even if it internally uses UTF-16 to represent the data. — Voo, Nov 27 '12 at 21:06

score 2 · Answer 2 · answered Nov 27 '12 at 20:53

You are best off choosing a standard character encoding for the data you send over the pipe, and then require all machines to send their data using that encoding.

Windows uses UTF-16LE, so you could choose to use UTF-16LE over the pipe and then Windows machines can send their UTF-16LE encoded strings as-is, but Linux machines would have to convert to/from UTF-16LE as needed.

Or you could choose UTF-8 instead, which would reduce network bandwidth, but both Windows and Linux machines would have to convert to/from UTF-8 as neded. For network communications, UTF-8 would be the better choice.

On Windows, you can use MultiByteToWideChar() and WideCharToMultiByte() with the CP_UTF8 codepage.

In Linux, use the iconv() API so you can specify the UTF-8 charset for encoding/decoding.

Don't really see any reason to use `wchar_t` under linux to begin with. Only causes problems and I can't think of many APIs that don't use UTF-8 and just return `char*` to begin with. — Voo, Nov 27 '12 at 20:55

Difference and conversions between wchar_t for Linux and for Windows

2 Answers2

Linked