47

I have recently become a teaching assistant for a university course which primarily teaches C. The course standardized on C90, mostly due to widespread compiler support. One of the very confusing concepts to C newbies with previous Java experience is the rule that variable declarations and code may not be intermingled within a block (compound statement).

This limitation was finally lifted with C99, but I wonder: does anybody know why it was there in the first place? Does it simplify variable scope analysis? Does it allow the programmer to specify at which points of program execution the stack should grow for new variables?

I assume the language designers wouldn't have added such a limitation if it had absolutely no purpose at all.

RavuAlHemio
  • 2,331
  • 19
  • 22
  • 1
    There was some discussion about this here: http://stackoverflow.com/questions/6488503/c89-mixing-variable-declarations-and-code I'm not really sure why it was decided like that, but to be honest I kind of like all of the variables being declared at the beginning of the scope. – AusCBloke Oct 22 '11 at 12:22
  • I know it might be an advanced subject, but I'd love to learn how compilers optimize stuff, when they can and when they can't and what practices are good to keep in mind when it comes to let the compiler optimize C code. A programmer might not know good practices and produce slow program because he don't know compiler design, but a course about compiling automatic optimisation might do a lot. – jokoon Oct 22 '11 at 22:09

6 Answers6

60

In the very beginning of C the available memory and CPU resources were really scarce. So it had to compile really fast with minimal memory requirements.

Therefore the C language has been designed to require only a very simple compiler which compiles fast. This in turn lead to "single-pass compiler" concept: The compiler reads the source-file and translates everything into assembler code as soon as possible - usually while reading the source file. For example: When the compiler reads the definition of a global variable the appropriate code is emitted immediately.

This trait is visible in C up until today:

  • C requires "forward declarations" of all and everything. A multi-pass compiler could look forward and deduce the declarations of variables of functions in the same file by itself.
  • This in turn makes the *.h files necessary.
  • When compiling a function, the layout of the stack frame must be computed as soon as possible - otherwise the compiler had to do several passes over the function body.

Nowadays no serious C compiler is still "single pass", because many important optimizations cannot be done within one pass. A little bit more can be found in Wikipedia.

The standard body lingered for quite some time to relax that "single-pass" point in regard to the function body. I assume, that other things were more important.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
A.H.
  • 63,967
  • 15
  • 92
  • 126
  • 7
    "When compiling a function, the layout of the stack frame must be computed as soon as possible" - although the restriction is that variables are declared at the start of a *block*, not necessarily the start of a *function*. So a single-pass C89 compiler would have to move the stack pointer on entry and exit to each block in order to take advantage of the restriction. In which case I think it could just as well move the stack pointer each time a variable is declared in the middle of code (in effect, start an extra scope at that point) and remain single-pass. – Steve Jessop Oct 22 '11 at 15:09
  • 1
    @SteveJessop Indeed I should have written "block" instead "function", perhaps even "scope". But it is not possible to just move the SP as you wrote, because temporary results pushed onto the stack *before* the new variable couldn't be poped from the stack then. The kind of stack and register management required for that together with "single pass" was simply not possible at that time. – A.H. Oct 22 '11 at 16:03
  • 2
    @A.H: well, then single-pass compilers under those constraints weren't possible even for C89. Clearly in 1989 the compiler technology and capacity existed to remember where on the stack an intermediate result was stored, rather than being able only to push and pop them. Given that C89 broke with any old-tech compilers that couldn't do that, I don't really see why they retained the restriction other than because it was the style that everyone was used to. – Steve Jessop Oct 22 '11 at 16:24
  • 1
    Actually, come to think if it, in C89 the translation limits specify 127 block-scope identifiers in a single block, but only 15 nested levels of compound statements. So it does make sense that the collective burden of creating a group of variables, is restricted to occur at only one point per block. I still claim that this burden includes moving the stack pointer, but it doesn't really matter what the burden is, just that the standard mandates support for only a small number of nested blocks. – Steve Jessop Oct 22 '11 at 16:34
  • Likewise (although rather tangentially), C89 only guaranteed that a compiler would give a damn about a function name for the first 6 characters - hence why the standard library functions are so tersely named. – fluffy Oct 22 '11 at 17:12
  • It's funny then that a C# program with the same amount of lines compiles in 1/20th of the time. Although it doesn't produce assembly but intermediary code.. so that may be related. – Andreas Bonini Oct 22 '11 at 17:49
  • 7
    Even after this explanation, I don't have any clue what the advantage of defining variables at the beginning of a block is. Since you can put arbitrary expressions in the initializers, the compiler still needs to be able to handle arbitrary code mixed with declarations. – Sven Marnach Oct 22 '11 at 21:59
  • @Andreas: given the number of lines in a typical `stdio.h` or `stdlib.h`, you'd have a job *finding* a C# program with as many lines as a C program that's even vaguely comparable to it! – Steve Jessop Oct 23 '11 at 02:04
  • @Steve: you are right, but header files were added (according to the answer) to speed up compilation times.. and now they are slowing them down, apparently? – Andreas Bonini Oct 23 '11 at 02:07
  • @Andreas: They sped it up compared with one thing, which is a C program in which everything is fully defined. Assuming your benchmarks are correct, they slow it down compared with a completely different thing invented nearly 30 years later, with the benefit of better compiler techniques and completely different hardware. So there's no particular contradiction in that. – Steve Jessop Oct 23 '11 at 02:11
  • 1
    Although I don't think it's quite true as AH says, that forward declarations in particular imply the need for header files. It's the fact that the C build model permits the compiler to work solely on one source file at a time, and the linker to be very stupid, that implies each source file needs to contain information about everything it uses from other source files. The Java and C# compilation models, on the other hand, allow the compiler to go and look things up outside the source file as needed. It's a more advanced technique, and saves compiling reams of useless stuff in headers. – Steve Jessop Oct 23 '11 at 02:16
  • 2
    I have implemented a single-pass C to assembly compiler where declarations can be made after statements in every block. How it is done: you keep track of local variables allocation until the end of the currently compiled function and calculate the cumulative size of the variables. Obviously, you need this size at the beginning of the function, but you don't have it then. But nothing prevents you from generating code in this way: `goto L1` + `L2: function's body code` + `return` + `L1: stack_pointer -= size_of_locals` + `goto L2`. – Alexey Frunze Feb 01 '13 at 05:41
  • @AlexeyFrunze: Or, alternatively, `sub SP,word offset fooFuncFrame [rest of function] fooFuncFrame equ 20`. A two-pass assembler may need to know on the first pass whether it will need to subtract an 8-bit or 16-bit value, but if the code specifies 16 bit a assembler will know what operand size to use (16 bits) without having to know the operand value. – supercat Jun 21 '16 at 18:00
  • @supercat That may be unsupported by the assembler. – Alexey Frunze Jun 21 '16 at 20:07
  • 1
    @AlexeyFrunze: In that case, use something like "dw 0xEC81,fooFuncFrame" [or whatever hex values would be appropriate for "SUB SP,nn" instruction]. Being able to process forward labels for jumps (an ability without which an assembler would be pretty useless) isn't much different from being able to process them for other things. – supercat Jun 21 '16 at 20:24
  • @supercat EQU'd values are normally to be defined before use and they have neither address nor offset. – Alexey Frunze Jun 22 '16 at 05:32
  • @AlexeyFrunze: Assemblers may require varying amounts of trickery to make such things work [e.g. some may allow the technique to be accomplished straightforwardly by using ".set" rather than ".equ"], but I would expect that most would make it possible to accomplish in a single-pass compiler via some means. In worst case, a compiler could buffer object code in RAM until either it encounters the end of a function or it gets too large, and only have to do the "late entry point with jump back" hack in case the function gets too large for the buffer. – supercat Jun 22 '16 at 15:01
  • @AlexeyFrunze: That does remind me of something else I'm curious about, though: why compilers output to human-readable-style assembly language rather than a format which is optimized for efficient processing? Given that file I/O was often the limiting factor for compiler performance, I think that could have sped up builds by a factor of three or better in many cases while also making compilers smaller. – supercat Jun 22 '16 at 15:08
9

It was that way because it had always been done that way, it made writing compilers a little easier, and nobody had really thought of doing it any other way. In time people realised that it was more important to favour making life easier for language users rather than compiler writers.

I assume the language designers wouldn't have added such a limitation if it had absolutely no purpose at all.

Don't assume that the language designers set out to restrict the language. Often restrictions like this arise by chance and circumstance.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
5

I guess it should be easier for a non-optimising compiler to produce efficient code this way:

int a;
int b;
int c;
...

Although 3 separate variables are declared, the stack pointer can be incremented at once without optimising strategies such as reordering, etc.

Compare this to:

int a;
foo();
int b;
bar();
int c;

To increment the stack pointer just once, this requires a kind of optimisation, although not a very advanced one.

Moreover, as a stylistic issue, the first approach encourages a more disciplined way of coding (no wonder that Pascal too enforces this) by being able to see all the local variables at one place and eventually inspect them together as a whole. This provides a clearer separation between code and data.

Blagovest Buyukliev
  • 42,498
  • 14
  • 94
  • 130
  • 1
    What optimization is required in the second example? Of course you have to wait until the end of the block to know the total increment of the stack pointer needed to allocate all the variables, but this is not so difficult. Also, you can use a technique that is called, if I recall correctly, back patching: you generate a place holder for the total size of the variables on the stack and then set the correct value when finish compiling the block. – Giorgio Oct 22 '11 at 20:28
  • +1: I think the stylistic reason is also very important, good that you mentioned it. In fact I was a bit disappointed when this restriction was removed from C. :-) – Giorgio Oct 22 '11 at 20:30
  • 6
    Actually, I'd consider it stylistically superior to declare vars as close to first use as possible, to minimize their scope. But that's a different (contentious) question... – sleske Oct 26 '11 at 15:17
  • 2
    @sleske it's not just to minimize their scope: in many cases it's the only reasonable way to make a variable `const` (e.g. when the value depends on having executed a statement before initialization of the new variable). And if all your variables are declared in the beginning of a block, you have no chance to track accidental change of what is supposed to be a constant. – Ruslan Apr 23 '17 at 15:20
3

Requiring that variables declarations appear at the start of a compound statement did not impair the expressiveness of C89. Anything that one could legitimately do using a mid-block declaration could be done just as well by adding an open-brace before the declaration and doubling up the closing brace of the enclosing block. While such a requirement may sometimes have cluttered source code with extra opening and closing braces, such braces would not have been just noise--they would have marked the beginning and end of variables' scopes.

Consider the following two code examples:

{
  do_something_1();
  {
    int foo;
    foo = something1();
    if (foo) do_something_1(foo);
  }
  {
    int bar;
    bar = something2();
    if (bar) do_something_2(bar);
  }
  {
    int boz;
    boz = something3();
    if (boz) do_something_3(boz);
  }
}

and

{
  do_something_1();

  int foo;
  foo = something1();
  if (foo) do_something_1(foo);

  int bar;
  bar = something2();
  if (bar) do_something_2(bar);

  int boz;
  boz = something3();
  if (boz) do_something_3(boz);
}

From a run-time perspective, most modern compilers probably wouldn't care about whether foo is syntactically in scope during the execution of do_something3(), since it could determine that any value it held before that statement would not be used after. On the other hand, encouraging programmers to write declarations in a way which would generate sub-optimal code in the absence of an optimizing compiler is hardly an appealing concept.

Further, while handling the simpler cases involving intermixed variable declarations would not be difficult (even a 1970's compiler could have done it, if the authors wanted to allow such constructs), things become more complicated if the block which contains intermixed declarations also contains any goto or case labels. The creators of C probably thought allowing intermixing of variable declarations and other statements would complicate the standards too much to be worth the benefit.

supercat
  • 77,689
  • 9
  • 166
  • 211
2

Back in the days of C youth, when Dennis Ritchie worked on it, computers (PDP-11 for example) have very limited memory (e.g. 64K words), and the compiler had to be small, so it had to optimize very few things and very simply. And at that time (I coded in C on Sun-4/110 in the 1986-89 era), declaring register variables was really useful for the compiler.

Today's compilers are much more complex. For example, a recent version of GCC (4.6) has more 5 or 10 million lines of source code (depending upon how you measure it), and does a big lot of optimizations which did not existed when the first C compilers appeared.

And today's processors are also very different (you cannot suppose that today's machines are just like machines from the 1980s, but thousands of times faster and with thousands times more RAM and disk). Today, the memory hierarchy is very important: cache misses are what the processor does the most (waiting for data from RAM). But in the 1980s access to memory was almost as fast (or as slow, by current standards) than execution of a single machine instruction. This is completely false today: to read your RAM module, your processor may have to wait for several hundreds of nanoseconds, while for data in L1 cache, it can execute more that one instruction each nanosecond.

So don't think of C as a language very close to the hardware: this was true in the 1980s, but it is false today.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Actually C still remains VERY close to the hardware. As a matter of fact the complexity of the compilers (say GCC as you point out) is there TO ensure that C translates to something as close to the target hardware's optimizations as possible. – Ahmed Masud Oct 22 '11 at 12:37
  • 4
    What you said here might be interesting to know, but it is largely unrelated to OP's question. – akappa Oct 22 '11 at 12:40
  • @AhmedMasud: even Haskell is compiled to binary, but it does not means that it is a bare-metal language. – akappa Oct 22 '11 at 12:41
  • You could just as well say "don't think of assembly as a language very close to the hardware". :) – Alexey Frunze Feb 01 '13 at 05:46
  • @AlexeyFrunze this is quite true for ARB_fragment_program/ARB_vertex_program, for example, which on many GPUs doesn't directly map to hardware instruction set. – Ruslan Aug 08 '16 at 10:40
2

Oh, but you could (in a way) mix declarations and code, but declaring new variables was limited to the start of a block. For example, the following is valid C89 code:

void f()
{
  int a;
  do_something();
  {
    int b = do_something_else();
  }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
zvrba
  • 24,186
  • 3
  • 55
  • 65