4

Ok, curious error while building with Visual Studio Ultimate 2012 (probably issue with ANSI, unicode etc) in the code...

switch (input[index])
{
    case 'א': // Alef Hebrew character
        if (/*conditional*/) 
        {
            // Do stuff.
        }
    break;

    case 'ב': // Beth Hebrew character
        if (/*conditional*/)
        {
            //Do stuff
        }
    break;

    default:
    {
            //Do some other stuff.
    }
    break;

}

The second case parameter generates...

Error C2196: case value '?' already used

Simple fix if possible.

loumbut5
  • 65
  • 8

3 Answers3

6

Assuming input is an array of wchar_ts, your problem is that you're comparing a wide character to a narrow character literal.

As PeterT said in the comments:

If you save the file as utf-8 encoded then א is 0xD790 and ב is 0xD791, so if input[index] is of type char both would try to match 0xD7.

That's why you're getting the error you mentioned. (char has enough space to store an ASCII value, and the rest is omitted)


You can fix this by prefixing your literals with a capital L (turning them in to wide characters).

case L'א': // Alef Hebrew character
    if (/*conditional*/) 
    {
        // Do stuff.
    }
break;

case L'ב': // Beth Hebrew character
    if (/*conditional*/)
    {
        //Do stuff
    }
break;

Also, you need to make sure your source file is saved with unicode encoding & your compiler knows how to work with that.

Alternatively, you can simply escape the unicode value like so:

case L'\u05D0': // Aleph Hebrew character
// ...
case L'\u05D1': // Beth Hebrew character
Community
  • 1
  • 1
Ivan Rubinson
  • 3,001
  • 4
  • 19
  • 48
  • 1
    The constants should be `\uD790` etc. , it's an error to have `0x` there – M.M Sep 06 '16 at 09:16
  • 2
    It should be \u05D0 for Aleph and \u05D1 for Beth. D790 are utf8 encoded value, which is not what should be used after \u. – fefe Sep 06 '16 at 10:37
  • 1
    @IvanRubinson On Windows, `wchar_t` is a 16 bit type because Microsoft is shortsighted. Many Unicode characters have code points larger than that. – fuz Sep 06 '16 at 11:23
  • 1
    @FUZxxl on all compilers? Or just Visual Studio? – Ivan Rubinson Sep 06 '16 at 11:35
  • @IvanRubinson That's part of the Windows ABI, not compiler specific. – fuz Sep 06 '16 at 11:46
  • @IvanRubinson, does the same apply for Greek characters e.g. case L'β' || 'Β': if (/*conditional*/) { // Do stuff. } or case L'β' || L'Β': if (/*conditional*/) { // Do stuff. } – loumbut5 Sep 06 '16 at 12:41
  • @loumbut5 No reason why not. Just try it and see. – Ivan Rubinson Sep 06 '16 at 12:44
  • @IvanRubinson Nope. Still get Error C2196: case value '1' already used in the second case case L'α' || L'Α': /*is fine*/ case L'β' || L'Β': /*C2196*/ Any ideas whats wrong? I'm building with visual studio ultimate 2012 and project properties is set to accept Multi-byte character set. – loumbut5 Sep 06 '16 at 12:57
  • Try the second approach, the one with escaping (`'\u...'`) – Ivan Rubinson Sep 06 '16 at 12:58
  • case '\u03B2' || '\u0392': for case L'β' || L'Β': still generates C2196. – loumbut5 Sep 06 '16 at 13:32
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/122751/discussion-between-loumbut5-and-ivan-rubinson). – loumbut5 Sep 06 '16 at 13:45
  • This answer does not directly apply to C as `'א'` and `'ב'` are all ready distinct `int` values. Still, using `L` is a good idea. – chux - Reinstate Monica Sep 06 '16 at 14:57
3

C (and C++) are notoriously bad at unicode handling. The issue is that you're trying to fit some hebrew character into a char. But a char (a single byte) is only wide enough for ASCII.

If you're using modern (read, at least C++11) versions of C++, follow this answer.

If you're using C, you'll probably want to use IBM's ICU library.

Community
  • 1
  • 1
Ven
  • 19,015
  • 2
  • 41
  • 61
0

The 'א' character cannot fit into a char. Usually it take several bytes, so 'א' constitutes a "multicharacter literal", just like 'ab'(NOTE: SINGLE quote is used).

A "multicharacter literal" has type int, and implementation-defined value. And, different multicharacter literal may have the same value, as there are infinite number of multicharacter literals.

In your case, apparently your compiler treated both 'א' and 'ב' as '?', so it issues an error as the same value('?') appear after case labels twice in a same switch.

Assume you are using wchar_t, the fix is already given by Ivan.

If you are using char, then א is a multicharactar string instead of char. You can use function that compare strings to check whether א appears at a certain index of input:

const int ONE = 1;
if (strncmp(input+index, "א", sizeof("א") - ONE) == 0) {
      // Alef Hebrew character
    if (/*conditional*/) 
    {
       // Do stuff.
    }
} else if (strncmp(input+index, "ב", sizeof("ב") - ONE) == 0) {
     // Beth Hebrew character
    if (/*conditional*/)
    {
        //Do stuff
    }
} else {
    // default:
    // Do 
}
fefe
  • 3,342
  • 2
  • 23
  • 45
  • I am not able to directly input `1`. It seems that without a English letter, part of the line would be changed to right-left instead of the normal left-right, and generates a wired output. So I used `ONE` instead. – fefe Sep 06 '16 at 10:45
  • 1
    The question does not state that the type of `input[index]` is `char`. – barak manos Sep 06 '16 at 10:45
  • @barakmanos Right. But Alan already handle the whar_t part, which is mentioned in the answer. – fefe Sep 06 '16 at 10:46