1

I'm trying to work with hebrew chars in C++ , using Clion on mac.

char notification[140]={"א"}; //this is ALEF the first letter of Hebrew ABC.

for(int i=0; i < strlen(notification); i++) {
    cout << (int)notification[i] << endl;
} //Here I want to see what is the ASCII code for this letter.

the output for this for is :

-41
-112

Though there is only 1 char entered.

cout << char(-41) << char(-112) << endl; // this one gives me the output of the letter ALEF

cout << char(-41) << char(-111) << endl; //gives the second letter of Hebrew ABC.

I can't understand how it works why there is 2 chars to present 1 hebrew char ?

Mark Davydov
  • 337
  • 4
  • 18
  • 3
    It's a two-byte Unicode character. – Igor R. Sep 07 '15 at 20:09
  • 1
    you need to use wide characters i.e. unicode – Ed Heal Sep 07 '15 at 20:10
  • 1
    If you need actual unicode handling, you will need a library like ICU. Neither `char` nor `wchar` nor `string` nor `wstring` nor anything else in the standard library implement unicode. – Baum mit Augen Sep 07 '15 at 20:18
  • @BaummitAugen there is no way to use standard libraries to work with hebrew chars ? – Mark Davydov Sep 07 '15 at 20:28
  • @IgorR. Do you have any idea how to work with two-byte unicode characters? for example if i want to do some if : if(notification[0]==/*tow-byte char*/) dosomething(); – Mark Davydov Sep 07 '15 at 20:30
  • Its not Unicode, its UTF-8! – SHR Sep 07 '15 at 20:36
  • @MarkYoungCardinalDavidov If you only need to read and print them and not handle them in any complex way, you can probably get away with `std::string` on Linux and `std::wstring` on Windows and make your code portable with macros. But anything more then that: Don't waste your time, use sth. like ICU. – Baum mit Augen Sep 07 '15 at 20:36
  • @SHR UTF-8 *is* one possible encoding for Unicode data. – deviantfan Sep 07 '15 at 20:42
  • @deviantfan yes, but it not using wide character strings, and you don't need to use the wide functions like: `wcout`. look like all the comments here advise to use unicode and to use wide character string, while its just terminal configuration issue. – SHR Sep 07 '15 at 21:15
  • There is no ALEF in ASCII. You can't assign it to a `char` like that. – Lightness Races in Orbit Sep 07 '15 at 21:23

2 Answers2

2

You see the UTF8 code for "א". but apparently your terminal not support this charset or UTF8. (-41,-112) = (0xd7, 0x90)

Look here for UTF8 hebrew characters

You need to find how to configure the terminal to support Hebrew charset and UTF8.

maybe this can help

Community
  • 1
  • 1
SHR
  • 7,940
  • 9
  • 38
  • 57
2

There are several sub-problems here.

a)
You need your data in some Unicode format, instead of ASCII-based one-byte-characters. You have that already, but if not, no programming language feature of the world will do this automatically for you.

b)
As you have UTF8, depending on what you're doing, std::string etc. can handle the data well.
Eg.

  • input and output from/to files is ok
  • getting the used byte length is ok
  • (input/output to the terminal depends on the used terminal)
    ...

What is a problem is eg.

  • counting how much characters (not bytes) are there
  • accessing single characters with varname[number]
  • Stuff like Unicode normalization

... for such things, you'll need some more coding and/or external libs like ICU.

c)
Your terminal needs to support UTF8 if you want to print such stirngs directly to it (or read input from the user). This depends completely on the used OS and it's configuration, The C++ part can't help here. See eg. OS X Terminal UTF-8 issues

Community
  • 1
  • 1
deviantfan
  • 11,268
  • 3
  • 32
  • 49