Stupid mistakes in C. Break, Switch, If. 1990 Crash of Telephone Network

Question

I was hesitating to ask this, since it seems very easy.

What is wrong in this pseudocode?

In the switching software (written in C), there was;

a long "do... while" construct, which contained
a "switch" statement, which contained
an "if' clause, which contained
a "break," which was intended for the "if" clause
- but instead broke from the "switch" statement.

This caused a crash of the telephone system in 1990 (See: http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/att_collapse.html).

I need a very simple, explanation, why this code is wrong. I think the most simple answer is that within a if clause a break is not possible? So what statement needs to be written instead of a break within a if clause for getting the wanted effect, which is breaking the if clause?

This sounds odd, as in C the break is only for switch. You can't break out of an if. — Ingo, May 08 '11 at 15:00
like `switch`, all of `do { /* statements */ } while (cond)`, `while (cond) { /* statements */ }`, and `for (expr; expr; expr) { /* statements */ }` "answer" to `break`; unlike `switch`, the iteration statements also "answer" to `continue`. — pmg, May 08 '11 at 15:09
Where did you get the idea that `break` was intended to break out of `if`? I don't see it in the article (you probably misinterpreted something, since the author does not state anything like that directly), and the language has no such feature. — AnT stands with Russia, May 08 '11 at 15:14
The error was that they let some rookie programmer design very performance-sensitive applications. So this wasn't even a programming mistake. Nobody would get the idea to let some nurse student do brain surgery, but apparently the software industry doesn't work like that. It is just as bad now, 20 years later, with roughly 10% of the world's programmers being competent, 40% quacks and 50% rookies. — Lundin, May 08 '11 at 17:25

score 6 · Accepted Answer · edited Jun 20 '20 at 09:12

I suspect that the description / pseudo-code is incorrect when it says:

~~a "break," which was intended for the "if" clause~~

It would make sense if that was meant to be:

a break, which was intended to terminate the do while loop

The problem description then makes sense.

do
{
    ...
    switch (...)
    {
    case ...:
        ...
        break;
    ...
    case ...:
        ...
        if (critical_condition())
            break;  // Intended to exit loop - actually exits switch only
        ...
        break;      // Terminates the case in the switch
     }
 } while (!time_to_stop());

Reading the URL referenced in the question, the pseudo-code there is:

In pseudocode, the program read as follows:
1  while (ring receive buffer not empty 
          and side buffer not empty) DO

2    Initialize pointer to first message in side buffer
     or ring receive buffer

3    get copy of buffer

4    switch (message)

5       case (incoming_message):

6             if (sending switch is out of service) DO

7                 if (ring write buffer is empty) DO

8                     send "in service" to status map

9                 else

10                    break

                  END IF

11           process incoming message, set up pointers to
             optional parameters

12           break
       END SWITCH

13   do optional parameter work
When the destination switch received the second of the two closely timed messages while it was still busy with the first (buffer not empty, line 7), the program should have dropped out of the if clause (line 7), processed the incoming message, and set up the pointers to the database (line 11). Instead, because of the break statement in the else clause (line 10), the program dropped out of the case statement entirely and began doing optional parameter work which overwrote the data (line 13). Error correction software detected the overwrite and shut the switch down while it couls [sic] reset. Because every switch contained the same software, the resets cascaded down the network, incapacitating the system.

~~This agrees with my hypothesis - the pseudo-code in the question is an incorrect characterization of the pseudo-code in the paper.~~

Another reference on the same subject (found via a Google search 'att crash 1990 4ess') says:

Error Description

What was reported in ACM's Software Engineering Notes [Reference 2] is that the software defect was traced to an elementary programming error, which is described as follows:

In the offending "C" program text there was a construct of the form: [Erratic indentation as in original]
/* ``C'' Fragment to Illustrate AT&T Defect */   
do {

      switch expression {

          ...

                case (value):

                        if (logical) {
                                sequence of statements
                                        break
                        }
                        else
                        {
                                another sequence of statements
                        }
                        statements after if...else statement
                }

                statements after case statement

        } while (expression)

        statements after do...while statement
Programming Mistake Described

The mistake is that the programmer thought that the break statement applied to the if statement in the above passage, was clearly never exercised. If it had been, then the testers would have noticed the abnormal behavior and would have been able to corr [sic]

The only caveat to this statement is the following: it is possible that tests applied to the code contain information which would reveal the error; however, if the testers do not examine the output and notice the error, then the deficiency is not with th [sic]

In the case of a misplaced break statement, it is very likely that the error would have been detected.

References

"Can We Trust Our Software?", Newsweek, 29 January 1990.

ACM SIGSOFT, Software Engineering Notes, Vol. 15, No. 2, Page 11ff, April 1990.

Apparently, the programmer really did just think that break would end the if statement; it was a small mental blackout that led to a large real-world blackout.

I don't see it. The article explicitly states that the intended behavior was to proceed to step 11 after the `if`. I.e. the `break` at 10 was *not* supposed to break out of the cycle. Of course, the theory about that `break` being there to break out of `if` doesn't hold water either: `break` doesn't work that way in C and there's no need to "break out" of the last statement of `if` anyway. — AnT stands with Russia, May 08 '11 at 16:01
@Andrey: revised - after finding extra example discussing the faulty code. — Jonathan Leffler, May 08 '11 at 16:23

score 1 · Answer 2 · answered May 08 '11 at 15:38

If I understand it right, the else block where the incriminated break statement occurs is merely part of that "one line bug" as it's called before¹. I don't see any good reason for that else to exist there, unless those "certain types of messages" that received optimization were thought be the only occurrence of a non-empty buffer while processing a message. The description you linked misses good deals of domain knowledge, without which I at least cannot fully understand that piece of code. I'll try anyway to give an explanation.

As break statements can only refer to a switch or a loop, I can assume that:

hypothesis #1

the original coder intended to "speed processing of certain types of messages" by cutting the while statement with such a break. However, the nesting misled the guy and let him oversee that the switch statement and not the while was to be affected by the break.

hypothesis #2

the original coder really intended to quickly end the switch statement, but put that break too early and forgot to eventually update pointers to optional parameters, e.g. marking somehow that no optional parameters were provided with the current message.

I would thus call it "two lines bug"

Stupid mistakes in C. Break, Switch, If. 1990 Crash of Telephone Network

2 Answers2

Error Description

Programming Mistake Described

References

Linked