4

What is the character encoding expected in libc? For example, gethostname(char name, size_t namelen); takes char as argument. Is it expected that the name parameter be encoded in utf8(which keeps the ascii intact) or plain ascii or some other format?

Also does C mandates any character encoding scheme?

chappar
  • 7,275
  • 12
  • 44
  • 57

4 Answers4

3

All string functions (except widechar ones) support only native charset, e.g. ASCII on Unix/Linux/Windows or EBCDIC on IBM mainframe/midrange computers.

qrdl
  • 34,062
  • 14
  • 56
  • 86
  • How do use these functions in non english environment? – chappar May 28 '09 at 06:50
  • Also, i think libc hasn't got wchar_t* equivalent of all char * functions. – chappar May 28 '09 at 06:55
  • You have to convert yourself or get some lib to do the job - see more here: http://stackoverflow.com/questions/313555/light-c-unicode-library. Anyway you cannot name you host in UTF-8, can you? – qrdl May 28 '09 at 06:57
1
  • char uses ASCII
  • wchar_t is the standard C datatype for unicode

use and in order to deal with the wide characters.

dfa
  • 114,442
  • 31
  • 189
  • 228
0

char should be a 7-bit compatible ASCII encoding (I can't find any definite reference on this though). The definition of wchar_t is left to the implementation, but the C standard requires that the characters from the C portable character set be the same. If I understand this correctly, then

char a = 'a';
wchar_t aw = L'a';
if (a == (char)aw) {
    // should be true
}

The standard does not say anything about UTF-8.

JesperE
  • 63,317
  • 21
  • 138
  • 197
0

You will probably have to use a third-party library, such as GLib. This lib is portable and very useful, it also provides regular expressions, data structures and more.

Bastien Léonard
  • 60,478
  • 20
  • 78
  • 95