145

I long thought that in C, all variables had to be declared at the beginning of the function. I know that in C99, the rules are the same as in C++, but what are the variable declaration placement rules for C89/ANSI C?

The following code compiles successfully with gcc -std=c89 and gcc -ansi:

#include <stdio.h>
int main() {
    int i;
    for (i = 0; i < 10; i++) {
        char c = (i % 95) + 32;
        printf("%i: %c\n", i, c);
        char *s;
        s = "some string";
        puts(s);
    }
    return 0;
}

Shouldn't the declarations of c and s cause an error in C89/ANSI mode?

Nayuki
  • 17,911
  • 6
  • 53
  • 80
mcjabberz
  • 9,788
  • 10
  • 36
  • 38
  • 62
    Just a note: variables in ansi C don't have to be declared at the start of a function but rather at the start of a block. So, char c = ... at the top of your for loop is completely legal in ansi C. The char *s, however, would not be. – Jason Coco Nov 13 '08 at 21:56

8 Answers8

172

It compiles successfully because GCC allows the declaration of s as a GNU extension, even though it's not part of the C89 or ANSI standard. If you want to adhere strictly to those standards, you must pass the -pedantic flag.

The declaration of c at the start of a { } block is part of the C89 standard; the block doesn't have to be a function.

MarcH
  • 18,738
  • 1
  • 30
  • 25
mipadi
  • 398,885
  • 90
  • 523
  • 479
  • 46
    It is probably worth noting that only the declaration of `s` is an extension (from C89 point of view). The declaration of `c` is perfectly legal in C89, no extensions needed. – AnT stands with Russia Apr 16 '10 at 23:16
  • 8
    @AndreyT: Yeah, in C, variable declarations should be @ the beginning of a _block_ and not a function per se; but people confuse block with function since it's the primary example of a block. – legends2k Jun 14 '12 at 14:01
  • 1
    I moved the comment with +39 votes into the answer. – MarcH Jan 31 '20 at 23:28
81

For C89, you must declare all of your variables at the beginning of a scope block.

So, your char c declaration is valid as it is at the top of the for loop scope block. But, the char *s declaration should be an error.

Nayuki
  • 17,911
  • 6
  • 53
  • 80
Kiley Hykawy
  • 891
  • 5
  • 5
  • 3
    Quite correct. You can declare variables at the beginning of any { ... }. – Artelius Nov 13 '08 at 21:55
  • 6
    @Artelius Not quite correct. Only if the curlies are part of a block (not if they are part of a struct or union declaration or a braced initializer.) – Jens Jun 18 '13 at 19:39
  • Just to be pedantic, the erroneous declaration should be at least notified according to the C standard. So it should be an error or a warning in `gcc`. That is, don't trust that a program can be compile to mean that it is compliant. – jinawee Jan 09 '19 at 10:40
  • @Jens how do you declare new variables inside a struct, union or braced initializer? "A block" obviously stands for "a block of code" here. – MarcH Sep 23 '20 at 20:48
  • @MarcH That's not what Artelius said. He said "at the beginning of any { ... }" without qualification. – Jens Sep 24 '20 at 09:46
  • Yes and everyone understands what he meant because there is no other sensible meaning. – MarcH Sep 24 '20 at 19:53
48

Grouping variable declarations at the top of the block is a legacy likely due to limitations of old, primitive C compilers. All modern languages recommend and sometimes even enforce the declaration of local variables at the latest point: where they're first initialized. Because this gets rid of the risk of using a random value by mistake. Separating declaration and initialization also prevents you from using "const" (or "final") when you could.

C++ unfortunately keeps accepting the old, top declaration way for backward compatibility with C (one C compatibility drag out of many others...) But C++ tries to move away from it:

  • The design of C++ references does not even allow such top of the block grouping.
  • If you separate declaration and initialization of a C++ local object then you pay the cost of an extra constructor for nothing. If the no-arg constructor does not exist then again you are not even allowed to separate both!

C99 starts to move C in this same direction.

If you are worried of not finding where local variables are declared then it means you have a much bigger problem: the enclosing block is too long and should be split.

https://wiki.sei.cmu.edu/confluence/display/c/DCL19-C.+Minimize+the+scope+of+variables+and+functions

afk
  • 358
  • 2
  • 11
MarcH
  • 18,738
  • 1
  • 30
  • 25
  • 1
    http://www.learncpp.com/cpp-tutorial/21-basic-addressing-and-variable-declaration/ – MarcH Nov 05 '10 at 11:12
  • See also how forcing variable declarations at the top of the block can create security holes: http://lwn.net/Articles/443037/ – MarcH May 17 '11 at 08:36
  • "C++ unfortunately keeps accepting the old, top declaration way for backward compatibility with C ": IMHO, it's just the clean way to do it. Other language "solve" this problem by always initializing with 0. Bzzt, that only masks logic errors if you ask me. And there are quite a few cases where you NEED declaration without initialization because there are multiple possible locations for initialization. And that's why C++'s RAII is really a huge pain in the butt - Now you need to include a "valid" uninitialized state in each object to allow for these cases. – Jo So Apr 08 '18 at 23:51
  • "quite a few cases" != all cases – MarcH May 24 '18 at 16:01
  • "quite a few cases" * pain in the butt = quite a few cases with pain in the butt – Jo So May 25 '18 at 10:39
  • There's nothing here yet about forbidding the older and unsafe syntax, it's all just about allowing the newer and safer C99 syntax. So "quite a few cases" is off topic. Even if it were on topic, you could still require initialization and then re-assign later. Or use intermediate variables. Many choices easily optimized by the compiler. – MarcH May 30 '18 at 04:04
  • Note my point about RAII, which *does* forbid declaration-only (in the sense that declaration (a.k.a. *R*esource *A*quisition) *I*s *I*nitizalization). It's not possible to work around that with intermediate variables when you are e.g. initializing in branches based on a condition - the declared-and-initialized there goes out of scope so you can't reuse it after the branch. Really the only options are 1) Make a pseudo-constructor which does not idiomatically initialize (e.g. open a file), or 2) Allocate on the heap. Both options suck, they are just unnecessary noise. – Jo So May 30 '18 at 16:42
  • @JoSo: Deterministic initialization with zero, or requiring explicit initialization in cases where code that uses a value could be reached before code that sets it, are both reasonable approaches. While it may be useful to have a way of saying "trust me--you don't need to initialize it because the code which sets it will run before code that reads it" to invite a compiler to skip an unnecessary implicit initialization, the C/C++ approach offers the "worst of both worlds" IMHO. – supercat Jun 12 '18 at 20:00
  • @supercat, I can't figure what you mean by "the C/C++ approach". The approach of not implicitly initializing automatic variables? As I sad, implicitly default-initializing very often just hides the bugs. It prevents any possibility of an uninitialized read being detected at compile time, and it prevents many possibilities of detecting the rest of them at runtime. A crash from resulting from random data is much easier to detect than a spurious zero somewhere. – Jo So Jun 12 '18 at 21:11
  • Of course, sometimes a default-initialized variable (like an integer set to zero) is just what you want, but more often than not relying on defaults hints that the data is denormalized anyway. – Jo So Jun 12 '18 at 21:11
  • 1
    @JoSo: I'm confused why you think having reads of uninitialized variables yield arbitrary effects will make programming mistakes easier to detect than having them yield either a consistent value or a deterministic error? Note that there's no guarantee that a read of unintialized storage will behave in a fashion that's consistent with any bit pattern the variable could have held, nor even that such a program will behave in a fashion consistent with the usual laws of time and causality. Given something like `int y; ... if (x) { printf("X was true"); y=23;} return y;`... – supercat Jun 12 '18 at 22:18
  • ...a compiler could determine that X "must" be true when the test is reached, and therefore report that it is, whether it actually is or not. Somehow that's supposed to make it easier for a program to figure out that `y` wasn't initialized? – supercat Jun 12 '18 at 22:19
  • @supercat, `or a deterministic error?` That would be great, only I don't know a way to get that. As for `arbitrary effects`, they lead to easily detectable errors much more consistently than zero initialized variables do. (NULL pointers being the exception in some cases). As for `a compiler could determine that X "must" be true`, yes, I've heard compilers do all sorts of terrible things in the name of "optimization". But I haven't seen any compiler do that particular thing we're talking about. – Jo So Jun 13 '18 at 10:23
  • Well, you can often initialize to `-1337` when you're pretty sure that will never be a valid value. But typically it's much easier to just not initialize and get a nice compiler warning for a possibly uninitialized variable. And, what particular value to use instead of `-1337` varies from case to case, so it can't be encoded in the type definition. The only "canonical" default value is all-zeros, and typically, that doesn't lead to an easily detectable error. – Jo So Jun 13 '18 at 10:28
  • 1
    @JoSo: For pointers, especially on implementations which trap operations on `null`, all-bits-zero is often a useful trap value. Further, in languages which explicitly specify that variables default to all-bits-zero, reliance upon that value *isn't an error*. Compilers don't *yet* tend to get overly wacky with their "optimizations", but compiler writers keep trying to get more and more clever. A compiler option to initialize variables with deliberate pseudo-random variables might be useful for identifying faults, but merely leaving storage holding its last value can sometimes mask faults. – supercat Jun 13 '18 at 14:50
23

From a maintainability, rather than syntactic, standpoint, there are at least three trains of thought:

  1. Declare all variables at the beginning of the function so they'll be in one place and you'll be able to see the comprehensive list at a glance.

  2. Declare all variables as close as possible to the place they're first used, so you'll know why each is needed.

  3. Declare all variables at the beginning of the innermost scope block, so they'll go out of scope as soon as possible and allow the compiler to optimize memory and tell you if you accidentally use them where you hadn't intended.

I generally prefer the first option, as I find the others often force me to hunt through code for the declarations. Defining all variables up front also makes it easier to initialize and watch them from a debugger.

I'll sometimes declare variables within a smaller scope block, but only for a Good Reason, of which I have very few. One example might be after a fork(), to declare variables needed only by the child process. To me, this visual indicator is a helpful reminder of their purpose.

Adam Liss
  • 47,594
  • 12
  • 108
  • 150
  • 31
    I use option 2 or 3 so it is easier to find the variables -- because the functions shouldn't be so big that you can't see the variable declarations. – Jonathan Leffler Nov 15 '08 at 15:07
  • 8
    Option 3 is a non-issue, unless you use a compiler from the 70s. – edgar.holleis Nov 05 '10 at 11:45
  • 17
    If you used a decent IDE, you wouldn't need to go code hunting, because there should be an IDE-command to find the declaration for you. (F3 in Eclipse) – edgar.holleis Nov 05 '10 at 11:47
  • 4
    I don't understand how you can ensure initialization in option 1, may times you can only get the initial value later in the block, by calling another function, or performing a caclulation, may be. – Plumenator May 20 '11 at 12:03
  • 4
    @Plumenator: option 1 doesn't ensure initialization; I chose to initialize them upon declaration, either to their "correct" values or to something that will guarantee the subsequent code will break if they're not set appropriately. I say "chose" because my preference has changed to #2 since I wrote this, perhaps because I'm using Java more than C now, and because I have better dev tools. – Adam Liss May 28 '11 at 22:27
  • 2
    `Declare all variables at the beginning of the function so they'll be in one place and you'll be able to see the comprehensive list at a glance` but a) how is that useful or meaningful information without knowing the value of those variables, or how they will be modified? and b) that's 100% possible with static analysis; that should be automated by your code editor – iono Jul 30 '19 at 03:43
  • 2
    @iono You're about a decade late to the game, and tools are much better now. :) – Adam Liss Aug 01 '19 at 00:17
  • @AdamLiss unfortunately, my CS lecturer *isn't*; I arrived at this question because we're being forced to pre-declare variables **in our pseudocode**. ugh... – iono Aug 01 '19 at 11:40
  • @AdamLiss How would I see a comprehensive list of variables used in a function inside a modern-day IDE like Visual Studio or even a less-heavyweight one like PowerShell ISE? – Zian Choy Dec 28 '21 at 01:30
  • WIth very short functions, you can do all 3 at the same time – Philippe Carphin Apr 19 '22 at 17:15
8

As noted by others, GCC is permissive in this regard (and possibly other compilers, depending on the arguments they're called with) even when in 'C89' mode, unless you use 'pedantic' checking. To be honest, there are not many good reasons to not have pedantic on; quality modern code should always compile without warnings (or very few where you know you are doing something specific that is suspicious to the compiler as a possible mistake), so if you cannot make your code compile with a pedantic setup it probably needs some attention.

C89 requires that variables be declared before any other statements within each scope, later standards permit declaration closer to use (which can be both more intuitive and more efficient), especially the simultaneous declaration and initialization of a loop control variable in 'for' loops.

Gaidheal
  • 81
  • 1
  • 1
3

As has been noted, there are two schools of thought on this.

1) Declare everything at the top of functions because the year is 1987.

2) Declare closest to first use and in the smallest scope possible.

My answer to this is DO BOTH! Let me explain:

For long functions, 1) makes refactoring very hard. If you work in a codebase where the developers are against the idea of subroutines, then you'll have 50 variable declarations at the start of the function and some of them might just be an "i" for a for-loop that's at the very bottom of the function.

I therefore developed declaration-at-the-top-PTSD from this and tried to do option 2) religiously.

I came back around to option one because of one thing: short functions. If your functions are short enough, then you will have few local variables and since the function is short, if you put them at the top of the function, they will still be close to the first use.

Also, the anti-pattern of "declare and set to NULL" when you want to declare at the top but you haven't made some calculations necessary for initialization is resolved because the things you need to initialize will likely be received as arguments.

So now my thinking is that you should declare at the top of functions and as close as possible to first use. So BOTH! And the way to do that is with well divided subroutines.

But if you're working on a long function, then put things closest to first use because that way it will be easier to extract methods.

My recipe is this. For all local variables, take the variable and move it's declaration to the bottom, compile, then move the declaration to just before the compilation error. That's the first use. Do this for all local variables.

int foo = 0;
<code that uses foo>

int bar = 1;
<code that uses bar>

<code that uses foo>

Now, define a scope block that starts before the declaration and move the end until the program compiles

{
    int foo = 0;
    <code that uses foo>
}

int bar = 1;
<code that uses bar>

>>> First compilation error here
<code that uses foo>

This doesn't compile because there is some more code that uses foo. We can notice that the compiler was able to go through the code that uses bar because it doesn't use foo. At this point, there are two choices. The mechanical one is to just move the "}" downwards until it compiles, and the other choice is to inspect the code and determine if the order can be changed to:

{
    int foo = 0;
    <code that uses foo>
}

<code that uses foo>

int bar = 1;
<code that uses bar>

If the order can be switched, that's probably what you want because it shortens the lifespan of temporary values.

Another thing to note, does the value of foo need to be preserved between the blocks of code that use it, or could it just be a different foo in both. For example

int i;

for(i = 0; i < 8; ++i){
    ...
}

<some stuff>

for(i = 3; i < 32; ++i){
    ...
}

These situations need more than my procedure. The developer will have to analyse the code to determine what to do.

But the first step is finding the first use. You can do it visually but sometimes, it's just easier to delete the declaration, try to compile and just put it back above the first use. If that first use is inside an if statement, put it there and check if it compiles. The compiler will then identify other uses. Try to make a scope block that encompasses both uses.

After this mechanical part is done, then it becomes easier to analyse where the data is. If a variable is used in a big scope block, analyse the situation and see if you're just using the same variable for two different things (like an "i" that gets used for two for loops). If the uses are unrelated, create new variables for each of these unrelated uses.

0

I will quote some statements from the manual for gcc version 4.7.0 for a clear explanation.

"The compiler can accept several base standards, such as ‘c90’ or ‘c++98’, and GNU dialects of those standards, such as ‘gnu90’ or ‘gnu++98’. By specifying a base standard, the compiler will accept all programs following that standard and those using GNU extensions that do not contradict it. For example, ‘-std=c90’ turns off certain features of GCC that are incompatible with ISO C90, such as the asm and typeof keywords, but not other GNU extensions that do not have a meaning in ISO C90, such as omitting the middle term of a ?: expression."

I think the key point of your question is that why does not gcc conform to C89 even if the option "-std=c89" is used. I don't know the version of your gcc, but I think that there won't be big difference. The developer of gcc has told us that the option "-std=c89" just means the extensions which contradict C89 are turned off. So, it has nothing to do with some extensions that do not have a meaning in C89. And the extension that don't restrict the placement of variable declaration belongs to the extensions that do not contradict C89.

To be honest, everyone will think that it should conform C89 totally at the first sight of the option "-std=c89". But it doesn't. As for the problem that declare all variables at the beginning is better or worse is just A matter of habit.

junwanghe
  • 187
  • 1
  • 7
  • conforming doesn't mean not accepting extensions: as long as the compiler compiles valid programs and produces any required diagnostics for others, it conforms. – Remember Monica Sep 23 '12 at 21:28
  • 1
    @Marc Lehmann, yes, you are right when the word "conform" is used to differentiate compilers. But when the word "conform" is used to describe some usages, you can say "A usage does not conform the standard." And all beginners have a opinion that the usages which don't conform the standard should cause an error. – junwanghe Sep 25 '12 at 14:31
  • @Marc Lehmann, by the way, there is no diagnostic when gcc sees the usage that does not conform the C89 standard. – junwanghe Sep 25 '12 at 14:34
  • Your answer is still wrong, because claiming "gcc does not conform" is not the same thing as "some user program does not conform". Your usage of conform is simply incorrect. Besides, when I was a beginner I wasn't of the opinion you state, so that is wrong also. Lastly, there is no requirement for a conforming compiler to diagnose non-conformin code, and in fact, this is impossible to implement. – Remember Monica Jul 23 '14 at 21:17
-3

You should declare all variable at the top or "locally" in the function. The answer is:

It depends on what kind you system you are using:

1/ Embedded System (especially related to lives like Airplane or Car): It does allow you to use dynamic memory (eg: calloc, malloc, new...). Imagine you are working in a very big project, with 1000 engineers. What if they allocate new dynamic memory and forgot to remove it (when it does not use anymore)? If the embedded system run for a long time, it will lead to stack overflow and software will corrupt. Not easy to make sure the quality (the best way is ban dynamic memory).

If an Airplane run in 30days and doesnot turnoff, what happens if software is corrupted (when the airplane still in the air)?

2/ The others system like web, PC (have large memory space):

You should declare variable "locally" to optimize the memory using. If these system run for a long time and stack overflow happen (because someone forgot to remove dynamic memory). Just do the simple thing to reset the PC :P Its no impact on lives

Dang_Ho
  • 323
  • 3
  • 11
  • 1
    I'm not sure this is correct. I guess you're saying it's easier to audit for memory leaks if you declare all your local variables in one place? That *may* be true, but I'm not so sure I buy it. As for point (2), you say declaring the variable locally would "optimize the memory usage"? This is theoretically possible. A compiler could choose to resize the stack frame over the course of a function to minimize memory usage, but I'm not aware of any that do this. In reality, the compiler will just convert all the "local" declarations to "function-start behind the scenes." – QuinnFreedman May 06 '20 at 02:45
  • 1/ Embedded system sometime does not allow dynamic memory, so if you declare all variable in top of function. When source code is built, it can compute the number of bytes they need in stack to run the program. But with dynamic memory, compiler cannot do the same. – Dang_Ho May 07 '20 at 05:13
  • 2/ If you declare a variable locally, that variable is only exist inside"{}" open/close bracket. So the compiler can release the space of variable if that variable "out of scope". That may better than declare everything at the top of function. – Dang_Ho May 07 '20 at 05:19
  • I think you are confused about static vs dynamic memory. Static memory is allocated on the stack. All variables that are declared in a function, no matter where they are declared, are allocated statically. Dynamic memory is allocated on the heap with something like `malloc()`. Although I've never seen a device that is incapable of it, it is best practice to avoid dynamic allocation on embedded systems ([see here](http://mil-embedded.com/articles/justifiably-apis-militaryaerospace-embedded-code/)). But that has nothing to do with where you declare your variables in a function. – QuinnFreedman May 07 '20 at 17:50
  • Ya, I'm talking about dynamic memory if we declare variable in the middle of function. For example if(isHWInterrupt) int a; else char b; When compiler compile, it cannot know what if HW interrupt will happend or not. So it has 2 choices: 1. Creating space in stack for both int a and char b 2. Create variable in run-time and store in dynamic memory space. then release vairariable after condition end. With a second choice it is using dynamic memory. So if dynamic memory is banned, declare variable in middle of function is banned also. – Dang_Ho May 09 '20 at 03:14
  • 1
    While I agree that this would be a reasonable way to operate, it's not what happens in practice. Here is the actual assembly for something very much like your example: https://godbolt.org/z/mLhE9a. As you can see, on line 11, `sub rsp, 1008` is allocating space for the whole array *outside* of the if statement. This is true for `clang` and `gcc` at every version and optimization level I tried. – QuinnFreedman May 10 '20 at 04:58