0

I want two create two functions that can do this. So one function takes a character, for example the character a and returns the integer 97. The other function takes this integer 97 and returns the character a. I know this can be done by using the ASCII codes of these characters, but then it wouldn't work for characters like é, à, ö. Can this be done using unicode or another way?

For example:

int character_to_integer(char c) {
    convert character to integer and return
}

Input: character_to_index('é');
Output: 102 (for example)

char integer_to_character(int i) {
    convert integer to character and return
}

Input: integer_to_character(102);
Output: é

I want to do this with it: have an array, so for example int my_array[5] with all elements set to NULL at the start. Then for example, index 0, 3 and 4 (which correspond to a, d and e for example) are set to something other than NULL then I want to loop over it and build a string based off the which indexes aren't NULL, like so:

void build_string_from_array(int my_array) {
    char buffer[16];
    char c;
    for (i = 0; i < 5; i++) {
        if (my_array[i] != NULL) {
            c = integer_to_character(i);
            buffer[i] = c;
        }
    }
    buffer[5] = '\0';
    printf("%s\n", buffer);
}

Output: ade

Note, this is just an example, and I know there is probably something wrong with it, but it's just to get my point across. I know this can be done with ASCII codes, where all the characters are only 1 char, but how can this be done so that characters like é, that are seen as 2 chars would also work?

If it's not clear what I mean just ask me and I'll elaborate some more.

Mosbas
  • 129
  • 1
  • 9
  • maybe you can take a look at [this link](http://stackoverflow.com/questions/18819861/c-string-char-and-accents). – Matriac Mar 18 '16 at 15:56
  • Also since as you said an ascii character will do why do you want to use multiple characters for the representation? – Careful Now Mar 18 '16 at 16:00
  • C does not have a character type. `char` **is** and integer type. Solution in search of a problem? – too honest for this site Mar 18 '16 at 16:03
  • @CarefulNow ASCII code works for characters like `a`, but not for characters like `á`. – Mosbas Mar 18 '16 at 16:05
  • @Mosbas I haven't implemented this but try this page for the codes. It is extended ascii so will not be supported natively by chars but you could implement it yourself http://www.theasciicode.com.ar/extended-ascii-code/letter-e-acute-accent-e-acute-lowercase-ascii-code-130.html – Careful Now Mar 18 '16 at 16:10
  • 1
    You need to know what *encoding* is being used to represent the accented characters. Probably UTF-8 since they're multi-byte, but C isn't trying to interpret those bytes so it doesn't care. If it is UTF-8 then you can convert to Unicode codepoints using code similar to [this](http://stackoverflow.com/a/148766/5987). – Mark Ransom Mar 18 '16 at 16:28
  • Reference the Build String function - you are passing an int - but using it as a pointer to an int array - this needs to be a int* (?). – Neil Mar 18 '16 at 16:51

1 Answers1

1

For single Byte chars, this is no Problem, since char is a integer:

int i = 'B';

and

char c = 0x33;

will work fine.

But, if you use UTF8 with chars with more than one Byte, you must convert the UTF8-String to a UCS4 String. Sadly there is no Standard API for that.

See also this Post: Converting a UTF-8 text to wchar_t

A other way is use wchar_t everywhere. This will not work well on Windows with chars outside the BMP, since the wchar_t implementation in Windows is brocken (wchar_t is still a Multibyte Character Set on Windows). On Linux it will work, if you not use compound chars.

Community
  • 1
  • 1