15

Fortran has a computationally efficient way called a 'computed goto'. The construct uses an index into a branch table to perform a direct goto. If I remember correctly the syntax is:

go to index (label1, label2, ...)

where the index is used to reference a code pointer (label) in the parenthesized list.

I have a case where a computed goto is a better solution than a switch statement and would like to construct one, but I can't figure out how.

Now before the jibes and slings arrive, it is possible for the compiler to optimize a computed goto, but I have no guarantee that it will.


A switch statement can always be used. In some cases a switch statement can be optimized to a jump table (the implementation of a computed goto).

However, this is only possible when the range of case values is an almost dense covering (there is almost a case statement for each integer in the range of low values to high values). When this is not the case, the implementation will probably be a binary tree. The compiler writer has a choice to optimize to a jump table when appropriate or not. Where a binary tree will always satisfy the semantics of a switch statement, where sometimes a jump table is sufficient lets me ask whether I can guarantee a jump table when appropriate. I have no control over the compiler writer.

As a simple case, I often write lexers (FSMs), and I use three data constructs, one to map the input into an acceptable alphabet, one to perform node transitions, and one to execute some code based on the current state and the input value. The implementation of the FSM is a Mealy machine, not a Moore machine, so actions are performed on arcs (transitions) and not on nodes.

The actions performed are typically small, often no more than a line of source code. I recognize that functions can be used, and that their use removes the need for a jump table. But I believe that I can not 'point' to an inline function, and therefore functions are closed-form callable procedures.

This is less efficient in most cases, than a switch statement with or without jump table optimization. If I can use a jump table then I avoid the issue of what the compiler writer thinks of optimization and am able to write efficient code.

As to the general case brought up below about the issues related to a Fortran computed goto: This is not a criticism of that/those comment. But the qualitative issues, even though they are true, does not answer the question.

There is an answer below using void* &&label; and I'd like to thank you for that. But, alas, as you pointed out this is non-standard C/C++ and is likely not to be present in the future. So, it's better not to do this.

I hope that I've answered the 'get a better compiler' comment. And I hope I've at least addressed the issues with using function pointers. And at last, this was a moment of curiosity for me. I didn't think that I should provide the germicidal history of why I thought the question had some carrying power. But now I know. Whenever, and I mean whenever, I write to this group I better tell you what all my duckies are so that they can be shot down good and proper.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
lostbits
  • 928
  • 1
  • 9
  • 23
  • 1
    A `switch` statement maybe? – user0042 Jul 28 '17 at 18:25
  • If you want to avoid the "jibes and slings", you should at least say *why* a `switch` is less efficient – Passer By Jul 28 '17 at 18:29
  • 3
    "I have a case where a computed goto is a better solution than a switch statement and would like to construct one but can't figure out how." if that the case optimizer will do that on `switch`. If you do not believe - implement that in asm and try to beat compiler optimizer. – Slava Jul 28 '17 at 18:29
  • 4
    If your compiler doesn't optimize a switch to a computed goto, you probably need to update your compiler to any decent compiler – Justin Jul 28 '17 at 18:31
  • 2
    But in order to implement and profile, OP has to know how to implement. That's why I think this is a good question. – user4581301 Jul 28 '17 at 18:31
  • @user4581301 I do not think you can directly implement that using standard C++. It is not recommended to use regular `goto` due to readability issue. Computed goto would be even worse. – Slava Jul 28 '17 at 18:34
  • As an ex Fortran programmer, whenever I came across a computed goto my heart would sink - it's just unreadable, and a major source of bugs. Thankfully, C++ doesn't (directly) have such a construct. –  Jul 28 '17 at 18:51
  • @NeilButterworth, how is the switch any better than computed goto in gcc extension syntax? – SergeyA Jul 28 '17 at 18:55
  • @Sergey switch has labels, computed goto does not. –  Jul 28 '17 at 18:58
  • Why do you ask, and what is your concrete use case? **Edit your question** to improve and motivate it. – Basile Starynkevitch Jul 28 '17 at 19:02
  • 1
    @NeilButterworth, computed goto in gcc/clang extension (the only available form of computed goto in C++ code) do have labels. – SergeyA Jul 28 '17 at 19:04
  • @sergey My comment was about Standard Fortran (F77, being the last one I've seriously programmed in) and Standard C++. –  Jul 28 '17 at 19:05
  • @Neil Butterworth The computed goto and assigned goto, and indeed all goto's, in F66/F77 have labels. The labels are the numbers given in column 1-5 of the Hellerith card.I don't remember if any F77 extensions included non-numeric labels. – lostbits Jul 28 '17 at 20:39
  • @Arthur There is a big difference between a goto target being a line number to jump to (F77) , and a label being an integer constant (C and C++) which matches the value being despatched on. –  Jul 28 '17 at 21:27
  • 1
    If you are sure to compile with GCC or Clang, and you don't care about long-term portability (i.e. agree to change your code in a few years when needed), then using GCC extension is very sensible (and it is done by a lot of free software). – Basile Starynkevitch Jul 29 '17 at 04:05
  • The standard C++ switch and Fortran computed goto are pretty much the same thing and compilers convert them to roughly the same code. It isn't clear why you expect significant performance differences. – n. m. could be an AI Feb 07 '20 at 20:52
  • see here: https://www.youtube.com/watch?v=IAdLwUXRUvg – alfC Aug 17 '22 at 15:36
  • Please show some code, the assembly it is compiled to, ans the assembly you would expect from computed goto. – n. m. could be an AI Oct 01 '22 at 12:42

7 Answers7

26

If you compile with a recent GCC compiler (e.g., GCC 7 or GCC 6) - or even, for C code, older versions of GCC, you could use its labels as values language extension (so outside of C++11 or C++14 standards), which works with both C & C++. The prefix && operator gives the address of a label, and the goto is computed if followed by * indirection operator. You'll better have target labels starting some blocks.

For example:

#include <map>

int foo (std::map<int,int>& m, int d, int x) {
    static const void* array[] = {&&lab1, &&lab2, &&lab3 };
    goto *array[d%3];
lab1: {
        m[d]= 1;
        return 0;
    };
lab2: {
        m[2*d]=x;
        return 1;
    }
lab3: {
    m[d+1]= 4*x;
    return 2;
    }
}

(of course, for the above code, a plain switch would be preferable, and probably as efficient)

BTW, recent Clang (e.g., clang++-5.0) also accepts that extension.

(Computed gotos are not exception-friendly, so they could disappear in future versions of GCC for C++.)

And with threaded code programming techniques, you could write some quite efficient (bytecode) interpreters using that, and in that particular case the code stays very readable (because it is very linear) and is quite efficient. BTW, you could hide such computed gotos with macros and conditional compilation - e.g., #if-s- (e.g., to use instead switch on compilers not supporting that extension); then your code would be quite portable. For an example in C, look into OCaml's runtime/interp.c.


For referenceEli Bendersky, The computed goto version is faster because of two reasons:

  1. The switch does a bit more per iteration because of bounds checking.
  2. The effects of hardware branch prediction.

There are many variants of a switch which a compiler can implement.

  1. Binary search of the switch space with if constructs.
  2. A table of 'case' location (computed goto like).
  3. A computed branch, requiring all cases have the same code size forming a 'code array'.

For the OPs state machine dispatch, the item 2 is the best case. It is the only construct which does not require a return to the main switch dispatch location. So, the break; can transfer control to the next case. This is why the mechanics are more effective for branch prediction.

artless noise
  • 21,212
  • 6
  • 68
  • 105
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
4

Yes (not directly), by using switch or creating a table of function pointers or function objects.

Most compilers will translate the switch into a computed goto (a.k.a. jump table).

An array of function pointers is about the same thing. You dereference the array slot to execute the function.

You could also use std::map or the other containers with function pointers.

Edit 1: Example of computed goto using array

typedef (void) (*Pointer_To_Function)();
void Hello()
{
  cout << "Hello\n";
}

void Bye()
{
  cout << "Bye\n";
}

static const Pointer_To_Function function_table[] =
{
  Hello,
  Bye,
}

int main()
{
  (*function_table[0])();
  (*function_table[1])();

  // A "computed goto" based on a variable
  unsigned int i = 0;
  (*function_table[i])();
  return 0;
}

Edit 2: Computed goto using switch

int main()
{
  int i = 1;
  switch (i)
  {
    case 0: Hello(); break;
    case 1: Bye(); break;
  }
  return 0;
}

Tell your compiler to generate an assembly language listing for each of the above examples.

Most likely, they will look like a computed goto or a jump table. If not, raise the optimization level.

To achieve a good optimization for the switch or array, the case values should be contiguous within a range. For selecting ranges with holes, a std::map of may be more efficient (or using a table for smaller quantities).

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • 2
    Most compilers will not optimize switch of 2 items into computed goto. – SergeyA Jul 28 '17 at 18:47
  • The example is too simple. Take a switch statement with 50 case statements with values ranging from 1 .. 1,000,000. A jump table would contain 1,000,000 entries with only 50 representing case statements. The compiler writer has to choose when to construct a jump table, and therefore there is no certainly that the code will be a jump table even if feasible. – lostbits Jul 28 '17 at 20:27
  • I don't understand your objective. Take a computed `goto` with 50 non-contiguous values. How does the compile generate the code? In situations where the selection values are not contiguous, you can't have an array of address as this may take up too much space and leave holes. In these cases, one would have to search a table (associated map) to find the matching index, then call the function associated with the key. – Thomas Matthews Jul 28 '17 at 20:40
  • @SergeyA: So, how many cases do I need in order to illustrate the point? – Thomas Matthews Jul 28 '17 at 20:41
  • @ThomasMatthews, I think, your second example should not be there at all. Function pointers are a good illustration into computed goto alternative, but switch statements are, in my view, not. – SergeyA Jul 28 '17 at 20:44
  • Thomas Matthews converting the switch into a binary tree will always work. Converting a switch statement into a jump table will always work but is sometimes impractical. If the compiler does both then the border between them is at the hands of the compiler writer. Further, if the case statements cover and almost dense set of values (n .. k with a fiew case statements missing) where it is possible to create a jump table ontining null entries,, then it is again a compiler writer decision. – lostbits Jul 29 '17 at 20:05
3
using jump_func_t = void(*)(void const*);
template<class F>
jump_func_t jump_func() {
    return [](void const*ptr){ (*static_cast<F const*>(ptr))(); };
}
template<class...Fs>
void jump_table( std::size_t i, Fs const&...fs ) {
  struct entry {
    jump_func_t f;
    void const* data;
    void operator()()const { f(data); }
  };
  const entry table[] = {
    {jump_func<Fs>(), std::addressof(fs)}...
  };
  table[i]();
}

Test code:

int x = 0, y = 0, z = 0;
jump_table( 3,
    [&]{ ++x; },
    [&]{ ++y; },
    [&]{ ++z; },
    [&]{ ++x; ++z; }
);
std::cout << x << y << z << "\n";

outputs 101.

Live example

If you want large amounts of gaps, extra work will have to be done. Short "gaps" can be handled with invalid jump target:

using action = void();
static action*const invalid_jump = 0;

which should segmentation fault if actually called.

For a really sparse table, you'd want to pass in compile-time constants for table size and compile time indexes of each target, then build the table up from that. Depending on how efficient you want to be, that may require reasonably fancy compile time programming.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • just noticed your answer and I will be trying it out in the next few days. Thanks. – lostbits Aug 07 '17 at 21:00
  • I did something like this as a part of event handler engine inside of a quickie (analogy with certain bedroom activity) project that should not have any dependencies. It's more like saving on writing time rather than execution, we have state-aware functors here – Swift - Friday Pie Feb 07 '20 at 20:08
2

There are some useful answers here already but I'm a little surprised nobody's mentioned tail call optimization yet, because depending how you have structured your implementation, it's possible the compiler is already doing what you'd hoped for!

Essentially, you can get a computed goto in a safe and structured way if you write each instruction as an individual function, which calls the "next instruction" function as the last thing it does. As long as no processing is required after the call, and the functions implementing the operations all have the same signature (and the same signature as the "next instruction" function), the call should be optimized into a computed goto automatically.

Of course, optimization has to be enabled -- -O2 for GCC or /O2 for MSVC -- otherwise the function calls will recurse and consume stack. This is at least functionally correct, and if your execution traces are short, say less than 10k sequential operations, you should be able to debug with optimization disabled on a modern machine and OS.

Tail call elimination has been well understood since at least the days of LISP. As I understand, it's available on most C/C++ compilers since around 2000 at least, and definitely on LLVM, GCC, MSVC, and ICC. If you're building with LLVM you can use __attribute__((musttail)) to request this optimization even if optimization is generally disabled -- GCC doesn't seem to have caught up yet, but if you have to you can pass "-foptimize-sibling-calls" even if you're disabling other optimizations.

(This comment relating to the implementation status of that attribute in GCC is particularly relevant for its discusion of the merits of tail calls for this specific use case of replacing computed gotos: https://gcc.gnu.org/pipermail/gcc/2021-April/235891.html)

Obligatory concrete example:

#include <stdio.h>
#include <stdint.h>

using operation_t = void (*)();

static void op_add();
static void op_print();
static void op_halt();

// Test program
operation_t program[] = { op_add, op_print, op_halt };

// VM registers
int32_t ra = 5, rb = 3;
operation_t const * rpp = program;

static void exec_next_op() {
    // Call the next operation function -- since nothing is
    // returned from our stack frame and no other code needs to
    // run after the function call returns, the compiler can
    // destroy (or repurpose) this stack frame and then jump
    // directly to the code for the next operation.

    // If you need to stop execution prematurely or do debug
    // stuff, this is probably where you'd hook that up.

    (*(rpp++))();
}

static void op_add() {
    ra += rb;
    // EVERY operation besides halt must call exec_next_op as
    // the very last thing it does.
    exec_next_op();
}

static void op_print() {
    printf("%d\n", ra);
    // EVERY operation besides halt must call exec_next_op as
    // the very last thing it does.
    exec_next_op();
}

static void op_halt() {
    // DON'T exec_next_op(), and the notional call chain unwinds
    // -- notional because there is no call chain on the stack
    // if the compiler has done its job properly.
}

int main(int argc, char const * argv[]) {
    // Kick off execution
    exec_next_op();
    return 0;
}

On Compiler Explorer: https://godbolt.org/z/q8M1cq7W1

Observe that with -O2 on x86-64, GCC inlines the exec_next_op() call and op_* (except halt) end in an indirect "jmp rax" instruction.

For completeness I went through quite a few architectures, and support was fine for the major ones -- MSVC, GCC, and Clang on x86/x86-64, ARM/ARM64, and RISC-V, but a couple older and more obscure architectures did fail to optimize.

GCC targeting ESP32 was the only one a current developer might be bothered about, but I noticed GCC targeting SPARC also wasn't able to do it. I suspect they'd handle static tail recursion but calling a function pointer does imply an extra level of special case handling which is really unusual to need.

If you're interested in reading what people here have to say about compatibility, you could take a look at Which, if any, C++ compilers do tail-recursion optimization?

GCC is able to optimize the C++ pointer-to-member version as well:

#include <stdint.h>
#include <stdio.h>

class Interpreter {
   public:
    using operation_t = void (Interpreter::*)();

    operation_t program[3];

    // VM registers
    int32_t ra = 5, rb = 3;
    operation_t const* rpp = program;

    Interpreter()
        : program{&Interpreter::op_add, &Interpreter::op_print,
                  &Interpreter::op_halt} {}

    void exec_next_op() { (this->**(rpp++))(); }

    void op_add() {
        ra += rb;
        exec_next_op();
    }

    void op_print() {
        printf("%d\n", ra);
        exec_next_op();
    }

    void op_halt() {}
};

int main(int argc, char const* argv[]) {
    // Kick off execution
    Interpreter interp;
    interp.exec_next_op();
    return 0;
}

https://godbolt.org/z/n43rYP81r

j_l
  • 111
  • 1
  • 2
  • 1
    This is not exactly what I had in mind. A computed goto uses and indexed array with pointers to statements. Your scheme use functions, and depends on tail-end optimization, and then cycle through a succession of functions. A Fortran goto looks like goto(stmt1, ...stmtk), var. Storage of a statement label in an array and a branch to the statement based on an indexing variable. However, this type of scheme creates havoc with creating a dependency graph and ensuring that the state on entry to a statement is correct. – lostbits Feb 04 '23 at 22:44
  • If your complaint is really that you miss a feature of Fortran _syntax_, I am confident no answer will be satisfying. But, that's different from saying that C++ can't neatly represent a solution with the performance characteristics you're looking for. I used the function pointers as tokens, but an indirect lookup would have optimized similarly. If you want to refer to local state, you can make the interpreter a monolithic function, with lambdas capturing the locals. Personally, I wouldn't do it, either in C++ _or_ in Fortran. – j_l May 27 '23 at 23:09
  • And in fairness to both of us, my purpose in adding an answer was more to fill a gap I saw in the information here, than to answer your question exactly as you framed it. In particular, "I avoid the issue of what the compiler writer thinks of optimization and am able to write efficient code" set up a false premise: you cannot possibly guarantee you will optimize better for future architectures that don't exist yet, than future compiler writers. Someone who misses Fortran should be aware of the many optimization techniques we don't use anymore because they pessimize now. – j_l May 27 '23 at 23:19
1

... whether I can guarantee a jump table when appropriate. I have no control over the compiler writer.

No, you cannot. In fact, given that this is C, even if you were to cleverly implement your own jump table, you have no guarantees that the compiler won’t undo your efforts. If you want a guarantee, you have to write in assembly yourself. Compilers are only held to the "as-if" standard. They must do something which as-if they did what you told them to do.

Most compilers are very good at this sort of optimization. You aren't the first developer to develop parsers. In fact, I would expect them to use jump tables when efficient and not use jump tables when it is inefficient. You might accidentally make it do something less efficient. Compilers typically have a large database of CPU timings that they draw upon to decide how best to write opcodes.

Now, if you can't make your code into a garden variety switch statement, you might be able to emulate this with a custom switch statement.

To replace go to index (label1, label2, ...), try

switch(index) {
    case trampoline1:
        goto label1;
    case trampoline2:
        goto label2;
    ...

}

It looks like those trampolines are "real," but I would expect any compiler worth its salt to do a pretty darn good job of optimizing this for you. Thus, this solution is compiler-specific, but it should work out in a wide variety of reasonable compilers. Statically-computable control flow is something compilers have been eating up for 30+ years. For instance, I know every compiler I've worked with can take

bool found = false;
for(int i = 0; i < N; i++)
{
    if (matches(i))
    {
        found = true;
        break;
    }
}

if (!found)
    return false;

doSomething();

And turn it into effectively

for(int i = 0; i < N; i++)
{
    if (match(i))
       goto label1;
}

return false;

label1:
doSomething();

But in the end, no, there isn't any construct to guarantee beyond a shadow of a doubt that a particular Fortran-inspired method is used under the hood in C. You will always have to test against your compiler. However, trust the compiler. And profile. Make sure this is a hot spot before you choose to die on a pyre over it.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Cort Ammon
  • 10,221
  • 31
  • 45
1

With modern compilers, you don't need computed goto.

I created the following code:

void __attribute__((noinline)) computed(uint8_t x)
{
        switch (x)
        {
                case 0:
                        goto lab0;
                case 1:
                        goto lab1;
                case 2:
                        goto lab2;
                case 3:
                        goto lab3;
                case 4:
                        goto lab4;
                case 5:
                        goto lab5;
                default:
                        abort();
        }
lab0:
        printf("A\n");
        return;
lab1:
        printf("B\n");
        return;
lab2:
        printf("C\n");
        return;
lab3:
        printf("D\n");
        return;
lab4:
        printf("E\n");
        return;
lab5:
        printf("F\n");
        return;
}
int main(int argc, char **argv)
{
        computed(3);
        return 0;
}

With both gcc (9.4.0) and clang (10.0.0), at optimization level of 0 no less, I checked the assembly code and it's indeed using computed goto at already optimization level 0 (with greater optimization levels clang figures out the printfs are very similar and simply calls one of them with a variable as its argument). I also checked C++ mode and it's doing the same.

You just need to create a long enough switch case table that begins at 0, has continuous indices, and where every case has a goto (except maybe the default case), and it will be automatically turned into a computed goto.

The problem of using computed goto explicitly, is that it's a non-standard language extension and makes your code compile only on non-standard compilers. It would be much better to compile on standard compilers too, even if a really bad compiler can't figure out that computed goto is the best approach and checks x against a long list of constants repeatedly to figure out the correct branch (which would contain just a single jump instruction).

juhist
  • 4,210
  • 16
  • 33
  • I agree to all the points. In my original response to someones original answer (sorry, just had to do it) I pointed out that the density of the numbers in a case statement may alter the computed goto. You have qualified this by saying [0'] the first case must have a value '0', and [1] the number range must be contiguous. I guess my question is on both points. Must the number range start at zero, and can there be a modest amount of missing numbers. As you point out, there is no standard way to contort the language to allow the construction of a computed goto. – lostbits Feb 04 '23 at 22:37
0

I can confirm what @juhist already mentioned. You can generate the computed goto implicitely by declaring an index into a jumptable (best is an enum) and assigning this to a variable as you go. The following proves this even works for cases where you dynamically branch inside a case statement (which is the sole reason of a computed goto).

Demo

#include <cstdio>
#include <cstdint>

volatile uint8_t number = 2;

int main() {
    enum class Label {
        label1,
        label2,
        label3,
        label4,
        done
    };

    Label next = Label::label1;

again:

    switch(next) {

        case Label::label1:

            printf("label1\n");
            next = Label::label3;
            goto again;

        case Label::label2:

            printf("label2\n");
            if (number == 2) {
                next = Label::done;
            } else {
                next = Label::label4;
            }
            goto again;

        case Label::label3:

            printf("label3\n");
            next = Label::label2;
            goto again;

        case Label::label4:

            printf("label4\n");
            next = Label::done;
            goto again;

        case Label::done:

            printf("done\n");
            break;
    }
}

Prints:

label1
label3
label2
done

The generated assembly is indeed the same as for a computed goto in gcc:

.LC0:
        .string "label1"
.LC1:
        .string "label3"
.LC2:
        .string "label2"
.LC3:
        .string "label4"
.LC4:
        .string "done"
main:
        push    rcx
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        mov     edi, OFFSET FLAT:.LC1
        call    puts
        mov     edi, OFFSET FLAT:.LC2
        call    puts
        mov     al, BYTE PTR number[rip]
        cmp     al, 2
        je      .L2
        mov     edi, OFFSET FLAT:.LC3
        call    puts
.L2:
        mov     edi, OFFSET FLAT:.LC4
        call    puts
        xor     eax, eax
        pop     rdx
        ret
number:
        .byte   2

Gcc detects that the next variable will actually not be changed when jumping to the top of the jumptable and just replace it with a direct jump instruction. Perfect! But what about clang and msvc?

clang x64 trunk seems to struggle a bit more, as with the same optimization (-Os) there's still a jump to the top of the switch statement, but it built the jumptable at least.

MSVC x86 latest also optimizes away the jump to the top of the jumptable and directly jumps to the precomputed labels - all good. But MSVC x64 seems to be drunk... not only is there still a jump to the top of the jumptable but it even does a horrible if / else over the switch values - disgusting! Can anyone explain this?

So in summary, gcc x86 and x64 good, MSVC x86 good, x64 horrible, clang x64 could be better. The question remains how good compilers are at optimizing more involved examples.

glades
  • 3,778
  • 1
  • 12
  • 34
  • 1
    The proposed solution depends on a compiler implementation. If the compiler does what is expected, you get a jump table. Otherwise, you don't. That's not the solution that I was looking for. What I wanted is to use standard C/C++ source code, and not a particular compiler, to generate a jump table. I would expect a computed goto if the density of the enums is close to 1: density = (maximum value - minimum value)/(number of entries). Density = 1 should yield a computed goto. As to the Microsoft C/C++ compiler. I have been, and am, unimpressed. code. – lostbits Apr 17 '23 at 21:40
  • @lostbits Well you could argue that everything depends on a compiler implementation unless you were to write your jumptable in assembly. Unfortunately there's no portable language construct to do that and that's a shame. But unless you're bound to MSVC x64, I think above "approximation" is the best you can do while also making it portable. – glades Apr 18 '23 at 06:02