I suspect that the description / pseudo-code is incorrect when it says:
a "break," which was intended for the "if" clause
It would make sense if that was meant to be:
- a
break
, which was intended to terminate the do while
loop
The problem description then makes sense.
do
{
...
switch (...)
{
case ...:
...
break;
...
case ...:
...
if (critical_condition())
break; // Intended to exit loop - actually exits switch only
...
break; // Terminates the case in the switch
}
} while (!time_to_stop());
Reading the URL referenced in the question, the pseudo-code there is:
In pseudocode, the program read as follows:
1 while (ring receive buffer not empty
and side buffer not empty) DO
2 Initialize pointer to first message in side buffer
or ring receive buffer
3 get copy of buffer
4 switch (message)
5 case (incoming_message):
6 if (sending switch is out of service) DO
7 if (ring write buffer is empty) DO
8 send "in service" to status map
9 else
10 break
END IF
11 process incoming message, set up pointers to
optional parameters
12 break
END SWITCH
13 do optional parameter work
When the destination switch received the second of the two closely timed messages while it was still busy with the first (buffer not empty, line 7), the program should have dropped out of the if clause (line 7), processed the incoming message, and set up the pointers to the database (line 11). Instead, because of the break statement in the else clause (line 10), the program dropped out of the case statement entirely and began doing optional parameter work which overwrote the data (line 13). Error correction software detected the overwrite and shut the switch down while it couls [sic] reset. Because every switch contained the same software, the resets cascaded down the network, incapacitating the system.
This agrees with my hypothesis - the pseudo-code in the question is an incorrect characterization of the pseudo-code in the paper.
Another reference on the same subject (found via a Google search 'att crash 1990 4ess') says:
Error Description
What was reported in ACM's Software Engineering Notes [Reference 2] is that the software defect was traced to an elementary programming error, which is described as follows:
In the offending "C" program text there was a construct of the form: [Erratic indentation as in original]
/* ``C'' Fragment to Illustrate AT&T Defect */
do {
switch expression {
...
case (value):
if (logical) {
sequence of statements
break
}
else
{
another sequence of statements
}
statements after if...else statement
}
statements after case statement
} while (expression)
statements after do...while statement
Programming Mistake Described
The mistake is that the programmer thought that the break statement applied to the if statement in the above passage, was clearly never exercised. If it had been, then the testers would have noticed the abnormal behavior and would have been able to corr [sic]
The only caveat to this statement is the following: it is possible that tests applied to the code contain information which would reveal the error; however, if the testers do not examine the output and notice the error, then the deficiency is not with th [sic]
In the case of a misplaced break statement, it is very likely that the error would have been detected.
References
"Can We Trust Our Software?", Newsweek, 29 January 1990.
ACM SIGSOFT, Software Engineering Notes, Vol. 15, No. 2, Page 11ff, April 1990.
Apparently, the programmer really did just think that break
would end the if
statement; it was a small mental blackout that led to a large real-world blackout.