C/C++ switch case on char arrays

Question

I have several data structures, each having a field of 4 bytes.

Since 4 bytes equal 1 int on my platform, I want to use them in case labels:

switch (* ((int*) &structure->id)) {
   case (* ((int*) "sqrt")): printf("its a sqrt!"); break;
   case (* ((int*) "log2")): printf("its a log2!"); break;
   case (((int) 'A')<<8 + (int) 'B'): printf("works somehow, but unreadable"); break;
   default: printf("unknown id");
}

This results in a compile error, telling me the case expression does not reduce to an int.

How can i use char arrays of limited size, and cast them into numerical types to use in switch/case?

Is this for C++ (as in the question title) or for C99 (as in the tags)? I'm not sure the answer is different between the two, but seeing two different languages in the question with no clear reason is confusing. — hmakholm left over Monica, Aug 16 '11 at 19:14
Now there are _two_ languages in the title, one of which is repeated in the tags. Are we to infer from this that you're not speaking about C++ after all? Why is it still in the title then? — hmakholm left over Monica, Aug 16 '11 at 19:24
Why write code that is confusing, likely to break, non-portable? Just use a dispatch table instead. — Ed Heal, Aug 17 '11 at 06:41

score 5 · Answer 1 · edited May 23 '17 at 12:13

Follow the exact method employed in video encoding with FourCC codes:

Set a FourCC value in C++

#define FOURCC(a,b,c,d) ( (uint32) (((d)<<24) | ((c)<<16) | ((b)<<8) | (a)) )

Probably a good idea to use enumerated types or macros for each identifier:

enum {
    ID_SQRT = FOURCC( 's', 'q', 'r', 't'),
    ID_LOG2 = FOURCC( 'l', 'o', 'g', '2')
};

int structure_id = FOURCC( structure->id[0], 
                           structure->id[1],
                           structure->id[2],
                           structure->id[3] );
switch (structure_id) {
case ID_SQRT: ...
case ID_LOG2: ...
}

templatetypedef · Answer 2 · 2011-08-17T06:19:52.633

I believe that the issue here is that in C, each case label in a switch statement must be an integer constant expression. From the C ISO spec, §6.8.4.2/3:

The expression of each case label shall be an integer constant expression [...]

(my emphasis)

The C spec then defines an "integer constant expression" as a constant expression where (§6.6/6):

An integer constant expression) shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof operator.

(my emphasis again). This suggests that you cannot typecast a character literal (a pointer) to an integer in a case statement, since that cast isn't allowed in an integer constant expression.

Intuitively, the reason for this might be that on some implementations the actual location of the strings in the generated executable isn't necessarily specified until linking. Consequently, the compiler might not be able to emit very good code for the switch statement if the labels depended on a constant expression that depend indirectly on the address of those strings, since it might miss opportunities to compile jump tables, for example. This is just an example, but the more rigorous language of the spec explicitly forbids you from doing what you've described above.

Hope this helps!

Frerich Raabe · Answer 3 · 2011-08-17T06:44:07.627

The issue is that the case branches of a switch expect a constant value. In particular, a constant which is known at compile time. The address of strings isn't known at compile time - the linker knows the address, but not even the final address. I think the final, relocated, address is only available at runtime.

You can simplify your problem to

void f() {
    int x[*(int*)"x"];
}

This yields the same error, since the address of the "x" literal is not known at compile time. This is different from e.g.

void f() {
    int x[sizeof("x")];
}

Since the compiler knows the size of the pointer (4 bytes in 32bit builds).

Now, how to fix your problem? Two things come to my mind:

Don't make the id field a string but an integer and then use a list of constants in your case statements.
I suspect that you will need to do a switch like this in multiple places, so my other suggestion is: don't use a switch in the first place to execute code depending on the type of the structure. Instead, the structure could offer a function pointer which can be called to do the right printf call. At the time the struct is created, the function pointer is set to the correct function.

Here's a code sketch illustrating the second idea:

struct MyStructure {
   const char *id;
   void (*printType)(struct MyStructure *, void);
   void (*doThat)(struct MyStructure *, int arg, int arg);
   /* ... */
};

static void printSqrtType( struct MyStructure * ) {
   printf( "its a sqrt\n" );
}

static void printLog2Type( struct MyStructure * ) {
   printf( "its a log2\n" );
}

static void printLog2Type( struct MyStructure * ) {
   printf( "works somehow, but unreadable\n" );
}

/* Initializes the function pointers in the structure depending on the id. */
void setupVTable( struct MyStructure *s ) {
  if ( !strcmp( s->id, "sqrt" ) ) {
    s->printType = printSqrtType;
  } else if ( !strcmp( s->id, "log2" ) ) {
    s->printType = printLog2Type;
  } else {
    s->printType = printUnreadableType;
  }
}

With this in place, your original code can just do:

void f( struct MyStruct *s ) {
    s->printType( s );
}

That way, you centralize the type check in a single place instead of cluttering your code with a lot of switch statements.

Sebastian Mach · Accepted Answer · 2011-08-17T06:57:26.907

Disclaimer: Don't use this except for fun or learning purposes. For serious code, use common idioms, never rely on compiler specific behaviour in the general case; if done anyway, incompatible platforms should trigger a compile time error or use the good, general code.

It seems the standard allows multi-character character constants as per the grammar. Haven't checked yet whether the following is really legal though.

~/$ cat main.cc

#include <iostream>

#ifdef I_AM_CERTAIN_THAT_MY_PLATFORM_SUPPORTS_THIS_CRAP
int main () {
    const char *foo = "fooo";
    switch ((foo[0]<<24) | (foo[1]<<16) | (foo[2]<<8) | (foo[3]<<0)) {
    case 'fooo': std::cout << "fooo!\n";  break;
    default:     std::cout << "bwaah!\n"; break;
    };
}
#else
#error oh oh oh
#endif

~/$ g++ -Wall -Wextra main.cc  &&  ./a.out
main.cc:5:10: warning: multi-character character constant
fooo!

edit: Oh look, directly below the grammar excerpt there is 2.13.2 Character Literals, Bullet 1:

[...] An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharac- ter literal has type int and implementation-defined value.

But in the second bullet:

[...] The value of a wide-character literal containing multiple c-chars is implementation-defined.

So be careful.

score 1 · Answer 5 · answered Aug 17 '11 at 06:34

This is especially dangerous because of alignment: on many architectures, int is 4-byte aligned, but character arrays are not. On sparc, for example, even if this code could compile (which it can't because the string address aren't known until link time) it would immediately raise SIGBUS.

score 1 · Answer 6 · answered Aug 17 '11 at 08:17

i just ended up using this macro, similar to case #3 in the question or phresnels answer.

#define CHAR4_TO_INT32(a, b, c, d) ((((int32_t)a)<<24)+ (((int32_t)b)<<16) + (((int32_t)c)<<8)+ (((int32_t)d)<<0)) 

switch (* ((int*) &structure->id)) {
   case (CHAR4_TO_INT32('S','Q','R','T')): printf("its a sqrt!"); break;
}

score 0 · Answer 7 · answered Aug 16 '11 at 19:18

0

this is more C than c++.

union int_char4 { int_32 x; char[4] y;}

a union declares, defines its members to start on the same address, essentially providing different types for the same set of bytes.

int_char4 ic4; ic4.x is an int and ic4.y is a pointer to the first byte of the char array.

since, you want to learn, the implementation is up to you.

answered Aug 16 '11 at 19:18

Peter Varga

109
2

2

In C99, the exactly-32-bit signed integer type is named `int32_t`, not `int_32`. – jwodder Aug 16 '11 at 19:25
I don't think this addresses the OP's original question. Can you elaborate on this? – templatetypedef Aug 17 '11 at 06:14
how do you use unions within case expressions? so that they reduce to constant expressions? (e.g. translating "sqrt" to the corresponding int?) – i_want_to_learn Aug 17 '11 at 06:24
Legally, you can only read the one union value that you have written to most recently. I.e., if you write to `y[...]`, reading from `x` yields undefined behaviour. – Sebastian Mach Aug 17 '11 at 06:51
1

Also: The syntax for declaring arrays in C is `type name [length]`, not `type [length] name`. Further the confusion `int_32 / int32_t`, this is enough for a downvote from my side (I wish everyone would justify his downvote like this, btw) – Sebastian Mach Aug 17 '11 at 06:53
an example maybe helps undestand what i meant: union { uint32_t f; char fc[4]; } x; x.f = 1; memcpy(x.fc, "abcd", 4); switch(x.f) { case 1: printf("found case %.*s\n", 4, x.fc); break; case 1684234849: printf("found by number %.*s\n", 4, x.fc); break; default: printf("nothing matched\n"); break; } – Peter Varga Aug 17 '11 at 20:45
That's undefined behaviour, @Peter. – Sebastian Mach Aug 22 '11 at 08:08

C/C++ switch case on char arrays

7 Answers7