28

I am trying to understand some things about jump tables and its relationship between a switch case statement.

I was told that a jump table is a O(1) structure that the compiler generates which makes lookup of values essentially about as fast as you can get. However in some cases a Hashtable/Dictionary might be faster. I was also told this will only work if the switch case contains ordered data values.

Can someone please confirm or deny this and explain what a jump table is, it's importance and the time complexity versus using a dictionary or hashtable. Thanks.

Brock Woolf
  • 46,656
  • 50
  • 121
  • 144

6 Answers6

32

A jump table is an abstract structure used to transfer control to another location. Goto, continue, and break are similar, except they always transfer to a specific location instead of one possibility from many. In particular, this control flow is not the same as a function call. (Wikipedia's article on branch tables is related.)

A switch statement is how to write jump tables in C/C++. Only a limited form is provided (can only switch on integral types) to make implementations easier and faster in this common case. (How to implement jump tables efficiently has been studied much more for integral types than for the general case.) A classic example is Duff's Device.

However, the full capability of a jump table is often not required, such as when every case would have a break statement. These "limited jump tables" are a different pattern, which is only taking advantage of a jump table's well-studied efficiency, and are common when each "action" is independent of the others.


Actual implementations of jump tables take different forms, mostly differing in how the key to index mapping is done. That mapping is where terms like "dictionary" and "hash table" come in, and those techniques can be used independently of a jump table. Saying that some code "uses a jump table" doesn't imply by itself that you have O(1) lookup.

The compiler is free to choose the lookup method for each switch statement, and there is no guarantee you'll get one particular implementation; however, compiler options such as optimize-for-speed and optimize-for-size should be taken into account.

You should look into studying data structures to get a handle on the different complexity requirements imposed by them. Briefly, if by "dictionary" you mean a balanced binary tree, then it is O(log n); and a hash table depends on its hash function and collision strategy. In the particular case of switch statements, since the compiler has full information, it can generate a perfect hash function which means O(1) lookup. However, don't get lost by just looking at overall algorithmic complexity: it hides important factors.

7

A jump table is basically an array of pointers to pieces of code to handle the various cases in the switch statement. It's most likely to be generated when your cases are dense (i.e. you have a case for every possible value in a range). For example, given a statement like:

switch (i) {
   case 1: printf("case 1"); break;
   case 2: printf("case 2"); break;
   case 3: printf("case 3"); break;
}

it could generate code roughly equivalent to something like this:

void case1() { printf("case 1"); }
void case2() { printf("case 2"); }
void case3() { printf("case 3"); }

typedef void (*pfunc)(void);

pfunc functions[3] = {case1, case2, case3};

if ((unsigned)i<3)    
    functions[i]();

This has O(K) complexity. A typical hash table also has roughly O(K) expected complexity, though the worst case is typically O(N). The jump table will usually be faster, but it will usually only be used if the table will be quite dense, whereas a hash table/dictionary works quite well even when the cases would be quite sparse.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
3

Suppose you had an array of procedures:

void fa() { 
 printf("a\n");
}

...

void fz() { 
 printf("it's z!\n");
}



typedef void (*F)();
F table[26]={fa,fb,...,fz};

Suppose you accept a character (from a-z) of input from the user and run fc:

char c;
switch(c) {
   case 'a': fa();break;
   case 'b': fb();break;
   ...
   case 'z': fz();break;       
   default: exit(-1);
}

Ideally this would be replaced with something like:

if (c<'a' || c>'z') exit(-1);
else (*table[c-'a'])();

Naturally, you might make the table bigger so the range check wouldn't be necessary.

The compiler would do this for arbitrary code, not necessarily function calls only, and would do it by storing the address to jump to (essentially, a goto). C doesn't directly support any sort of computed goto (indexing into a table or otherwise), but the CPU instructions for it are pretty simple.

Jonathan Graehl
  • 9,182
  • 36
  • 40
  • Doesn't that mean that if I only switch on 'a' and 'z' then 24 slots of memory in that table is "wasted" ? – Pacerier Mar 11 '12 at 16:00
  • the dead stripper in the optimizer ought to catch that and remove the unused ones if it can be known at compile time. If it's a value from runtime (file read, user input etc), it would keep them all because it can't know what needs to stay. – Alan Wolfe Sep 08 '16 at 18:18
2

Compiling for a switch statement can take many forms, depending on the cases. If the cases are close together, it is a no brainer: use a jump table. If the cases are far apart, use if (case == value) or use a map. Or a compiler can use a combination: islands of jump tables determined by if checks of the jump table ranges.

Richard Pennington
  • 19,673
  • 4
  • 43
  • 72
  • 1
    Speaking of hash tables, the compiler could definitely use perfect hashing rather than if checks + islands. – Jonathan Graehl Dec 03 '09 at 04:54
  • The only answer that doesn't get sidetracked into implementing its own jump table and stays on the key point: switch statements *act* like jump tables, *including* fall-through, but may have many different implementations, depending on many factors. –  Dec 03 '09 at 05:09
  • 2
    @Roger: I have to disagree. He specifically asked: "Can someone please ... explain what a jump table is, it's importance and the time complexity versus using a dictionary or hashtable." This answer does handwaving instead of answering the question (at all). – Jerry Coffin Dec 03 '09 at 05:27
  • You're right that it doesn't answer the second (and less important to the OP, the way I interpret it) part of the question, but it still doesn't get sidetracked. Let's see if I can do better. –  Dec 03 '09 at 05:33
  • @Roger: The first part was to confirm or deny "this" (apparently that a hash table might be faster in some cases), but this answer doesn't seem to attempt to address that either... – Jerry Coffin Dec 03 '09 at 05:49
  • Jerry: The OP really wants "to understand some things about jump tables and its relationship between a switch case statement." (Again, apparently you interpret him differently.) The fact that the OP gets sidetracked doesn't mean a good answer must also. –  Dec 03 '09 at 06:16
1

A jump table is simple an array of function pointers, you can picture a jump table roughly like so:

int (*functions[10])(); /* Array of 10 Function Pointers */

From my understanding, this is used with a case statement like so: each condition, case _, will be an index into this array, so for example:

switch( a ) {
    case 1:  // (*functions[1])() // Call function containing actions in case of 1
        ...  
    case 2:  // (*functions[2])() // Call function containing actions in case of 2
        ...

Each case, transforms to become simply functions[a]. This means that accessing functions[9] is just as quick as accessing functions[1]. Giving you the O(1) time you mentioned.

Obviously, if you have case 1, and case 4907, this isn't going to be a good method, and the hash table/dictionary methods you mentioned may come into play.

Dave
  • 698
  • 4
  • 12
  • Not exactly; case fall-through and arbitrary code using locals, in the case statement, still work properly with a jump table. The function pointers are just a pedagogic vehicle. – Jonathan Graehl Dec 03 '09 at 04:50
1

To further elaborate on Jerry's answer and others

Given:

int x=1;
switch (i) {
   case 1: x=6; break;
   case 2: x++;
   // Fall through
   case 3: x+=7; break;
}

you could have something like the following:

int f1() {return 6;}
int f2() {return 1+f3();}
int f3() {return 8;}

The the compiler could use a jump table to index {f1, f2, f3}

The compiler can do inlining when creating the table having f1, f2, f3 setting x directly to 6,9,8

But if you wrote the functions, and rolled your own jump table, f1,f2,f3 could be anywhere, but the compiler will know to put them close to the switch creating much better code locality than you could.

Note that in many cases the compiler will generate a guard to check if i is in range (or to handle the default) and if you are sure that it always is one of the cases, you could skip that

The interesting thing is that for under a small number of cases, and under different compiler flags (compiler dependent) the switch would not use a table, but would just do ifs, similar to:

if (i==1) x=f1();
else if (i==2) x=f2();
else if (i==3) x=f3();

or it might optimize this (where simple tests are one instruction) to:

x=(i==1) ? f1()
: (i==2) ? f2()
: (i==3) ? f3()
: x;

The best advice is to look at the assembly generated to see what the compiler did to your code on your architecture, g++ on Linux/intel will generate something like the following, if there is a jump table

(note I had to go to 5 case statements to force the jump table, it used ifs below that number of case statements)

Note that small holes will be in the jump table to do the default

int foo(int i)
{
   int x=1;
   switch (i) {
       case 1: x=6; break;
       case 2: x++;
        // Fall through
       case 3: x+=7; break;
       case 4: x+=2; break;
       case 5: x+=9; break;
    }
  return x;
}

would generate the following assembly code (// comments are mine):

        cmp     edi, 5                     //make sure it is not over 5
        ja      .L2                        //jump to default case
        mov     edi, edi
        jmp     [QWORD PTR .L4[0+rdi*8]]   // use the jump table at label L4:
.L4:
        .quad   .L2                        // if i=0, set x=1 (default)
        .quad   .L9                        // f1() see below
        .quad   .L10                       // f2() see below
        .quad   .L6                        // f3() see below
        .quad   .L7                        // f4() see below
        .quad   .L8                        // f5() see below
.L10:
        mov     eax, 9                     // x=9
        ret
.L9:
        mov     eax, 6                     // x=6
        ret
.L8:
        mov     eax, 10                    // x=10
        ret
.L6:
        mov     eax, 8                     // x=8
        ret
.L7:
        mov     eax, 3                     // x=3
        ret
.L2:
        mov     eax, 1                     // default, x was 1, noop is: x=1
        ret
Glenn Teitelbaum
  • 10,108
  • 3
  • 36
  • 80