Use of Goto within lexer/parser

Question

I have a lexer/parser pair (which I cribbed off someone else years ago). I am going to be adding a couple of features and thought I would first standardise the use of while(true) containing multiple if/else if/else vs a switch which uses a goto to jump back to before the switch.

(Before the flames start, I don't normally use goto as its evil etc. etc.)

The problem with a while(true) and a nested switch is that the break only breaks out of the switch and cannot get outside the while.

I have done some searching here and seen suggestions to use a return from inside the switch. Whilst this would work in some cases, in others, there is some processing after the while but before returning. Duplicating this code in multiple places doesn't really appeal.

I could also introduce a boolean flag and use that in the while statement to decide whether to break out of the while but that also doesn't appeal as it adds noise to the code.

The current way in the parser of using if/else if/else instead of an inner switch works but I do have a preference for a switch if possible.

The lexer code in general seems to get around this by removing the while(true) and putting a label just before the switch start and using goto to continue the loop. This leaves break meaning stop the loop and, to be honest, seems the cleanest way but does involve the dreadead goto.

Going back to the while(true), I can also see a third way. Use a label after the while(true) and let the switch code use goto to get to it when the loop should end. Break would then mean exit the switch but continue the loop.

So what are the panels views on this? Is goto too abhorrent to use? Or is it OK when there is just a single label to jump to and reduces indenting and produces otherwise clear code? Should parsers/lexers get special license to use gotos?

I can provide some sample code if it would help.

Goto is not evil. Its use is perfectly justified if you're coding something like a state machine. Wait a minute... Parser is most likely *is* a state machine. — SK-logic, Jun 30 '11 at 08:59
Indeed, look at the Linux Kernel source code, goto's in pretty neat locations, wonderful code! — Giovanni Funchal, Jun 30 '11 at 09:00
Linux Kernel is different because it's written in C and there is no more simple/clear/elegant way to handle errors and free resources other than 'goto cleanup'. Also, performance considerations might make us use one long function instead of splitting it up. None of this is valid for C#, there is just no reason to use goto there except for auto-generated code. — Konstantin Oznobihin, Jun 30 '11 at 09:10
possible duplicate of [GOTO still considered harmful?](http://stackoverflow.com/questions/46586/goto-still-considered-harmful) — Hans Passant, Jun 30 '11 at 09:34
@Simon: If you answered on a question by yourself, just mark your own answer as correct, please. — abatishchev, Jun 30 '11 at 12:22
@abatischev: OK, will do. What about in cases where there is more than one answer (or reasonable opinion) and it is hard to choose between them. Is it better to just choose one and hope those who weren't selected aren't upset? Choose the earliest (correct) response? Choose that which has the most upvotes? — Simon Hewitt, Jun 30 '11 at 12:34

Ira Baxter · Answer 1 · 2011-06-30T09:58:58.703

Use of GOTO in disciplined ways is fine. Languages which don't allow breaks out of arbitrarily nested block structures cause this question to be raised repeatedly, since the 1970s when people beat the question of "what control flow structures should a langauge have" to death. (Note: this complaint isn't special to lexers/parsers).

You don't want the scheme with boolean; it just adds extra overhead to the loop checks and clutters the code.

I think you have this problem:

   <if/while/loop head> {
       <if/while/loop head> {
             ...
                 if <cond>  <want to break out all blocks>
             ...
                            }
                       }

The proper cure with a good language is:

  blocks_label:
  <if/while/loop head> {
       <if/while/loop head> {
             ...
                 if <cond>  exit blocks_label;
             ...
                            }
                       }

if the exit construct exists in your language, that exits the blocks labelled by the named label. (There's no excuse for a modern langauge to not have this, but then, I don't design them).

It is perfectly satisfactory to write, as a poor man's substitute:

   <if/while/loop head> {
       <if/while/loop head> {
             ...
                 if <cond>  goto exit_these_blocks;
             ...
                            }
                       }
   exit_these_blocks:  // my language doesn't have decent block exits

On occasion you'll find a language that offers

break <exp>

where exp is usually a constant whole number, meaning, "break out of exp nested blocks". This is an astoundingly stupid idea, as some poor maintainer may later come along an insert another block somewhere in the stack, and now the code does crazy things. (In fact, this exact mistake in a telco switch took out the entire East Coast phone system about 20 years ago). If you see this construct in your langauge, use the poor man's substitute instead.

In your example, why is the goto considered a poor mans substitute for your exit construct? Given the two, I think the goto is actually more readable. I definitely agree that break is crap. — Simon Hewitt, Jun 30 '11 at 12:27
@Simon Hewitt: Because the poor man's version can get broken, by some maintainer adding a statement before the label. If such a statement is added after the label for the exit case, the compiler will see that the label isn't a label on the block, and complain. Its better conceptually, too: you named the nest of statements, and said exit the nest; that's much more constrained about purpose than a goto, which in general can go anywhere. So you've stated your purpose to the reader more clearly. — Ira Baxter, Jun 30 '11 at 13:39
Question that is in your domain. http://stackoverflow.com/questions/6540423/simple-dynamic-call-graphs-in-java — Chad Brewbaker, Jun 30 '11 at 20:03

score 2 · Accepted Answer · answered Jun 30 '11 at 09:10

2

Within parsers the use of GOTO is perfectly reasonable. When you get down to a base level, the loops and conditions etc are all implemented as gotos, because that is what processors can do - "take the next instruction to be executed from here".

The only problems with gotos, and the reason they are so often demonised, is that they can be an indication of unstructured code, coming form unstructured thinking. Within modern high level languages, there is no need for gotos, because all of the facilities are available to structure code well, and well structured code implies at least some structured thinking.

So use gotos if they are needed. Don't use them just because you can't be bothered to think things through properly.

answered Jun 30 '11 at 09:10

Schroedingers Cat

3,099
1
15
33

Why just within parsers and not other code? Do you consider the use of GOTO within parsers still reasonable if it is hand-coded rather than auto-generated? Goto's are never actually needed as far as I can see, there are always alternatives. – Simon Hewitt Jun 30 '11 at 12:24
Actually within any code they are a viable possibility. They are never needed, but there are cases where they provide the most appropriate solution to a problem. I think that parsers are one of those situations where they may be appropriate. Personally ,I have not used one for decades, so I would agree that they are never needed. Just sometimes they are valid, because the alternatives are more confusing. – Schroedingers Cat Jun 30 '11 at 12:42
Code should structured to the extent permitted by the problem domain, but if a problem domain is inherently unstructured, trying to impose structure on the code that's supposed to solve it may make things less readable. For example, if a problem is defined by a state machine, using `goto` labels to represent the states may in some cases be clearer and cleaner than using a "state" variable along with a `switch` statement. While performing a "goto" will immediately change shape, changing the state variable will effectively only change state if the switch statement is exited and re-entered. – supercat Oct 01 '12 at 18:34

Use of Goto within lexer/parser

2 Answers2