39

I'm currently reading "Expert C Programming - Deep C Secrets", and just came across this:

The storage class specifier auto is never needed. It is mostly meaningful to a compiler-writer making an entry in a symbol table — it says "this storage is automatically allocated on entering the block" (as opposed to statically allocated at compiletime, or dynamically allocated on the heap). auto is pretty much meaningless to all other programmers, since it can only be used inside a function, but data declarations in a function have this attribute by default.

I saw that someone asked about the same thing here, but they don't have any answer and the link given in comments only explains why there's such a keyword in C, inherited from B, and the differences with C++11 or pre-C++11.

I'm posting anyway to focus on the part stating that the auto keyword is somehow useful in compiler writing, but what is the idea nor the connection with a symbol table?

I really insist on the fact that I ask only about a potential usage when programming a compiler in C (not coding a C compiler).

To clarify, I asked this question because I'd like to know if there's an example of code where auto can be justified, because the author stated there would be, when writing compilers.

Here the whole point is that I think to have understood auto (inherited from B, where it was mandatory, but useless in C), but I can't imagine any example when using it is useful (or at least not useless).

It really seems that there isn't any reason at all to use auto, but is there any old source code or something like that corresponding to the quoted statements?

Chi_Iroh
  • 1,061
  • 5
  • 14
  • 5
    Compilers issue warnings as well as generating code, for example reading an uninitialised variable, or returning a pointer to a variable, and for that it needs to know if it was static or auto. – Weather Vane Jun 21 '23 at 21:34
  • 5
    Remember, there's a lot of things in the C specification that seemed like a great idea at the time, but were later proven to be kind of useless. There's a lot of feedback between the C and C++ standards teams, where C is probably like "oh, yeah, `auto` meaning "figure it out" does make a lot more sense, but oh well." – tadman Jun 21 '23 at 21:38
  • @WeatherVane Maybe I'm not understanding correctly what you meant, but here I'm expecting to know which (if any) examples in C may use auto. – Chi_Iroh Jun 21 '23 at 21:38
  • 1
    Just because you don't need to explicitly say `auto` doesn't mean the compiler doesn't need that information, which will be available via the symbol table. As your quote says: "a compiler-writer making an entry in a symbol table". – Weather Vane Jun 21 '23 at 21:40
  • 6
    The [book you quote from](https://www.amazon.co.uk/Expert-Programming-Peter-van-Linden/dp/0131774298) seems to be from 1994 (or before). So yeah, maybe it was true 29 years ago, but not so much these days. – pmacfarlane Jun 21 '23 at 21:42
  • 5
    Aside: I believe C23 is going to re-purpose `auto` to be more like C++. – pmacfarlane Jun 21 '23 at 21:47
  • @pmacfarlane I'm aware of this, I thought that this may not be true nowadays, but even if it's only an old rare artifact, I'd only like to fully understand the quotes, eventually seeing outdated source code to understand this, that's purely theoretical. – Chi_Iroh Jun 21 '23 at 21:48
  • 3
    "I can't imagine any example when using it is useful" -- I notice the book author claimed it to be useful without actually explaining how or why. Perhaps they were mistaken, or speaking of some'80s or early-90s "dumb" compilers that needed more hints? – Dave S Jun 21 '23 at 21:49
  • 1
    @Chi_Iroh, A singular use case for `auto` I've seen was when the attributes of an object like `static`, `extern`, `auto`, .... were rolled into a macro. In the case when no attributes were needed, the macro was assigned `auto` rather than _nothing_. – chux - Reinstate Monica Jun 21 '23 at 22:26
  • Oh, fun usage ! I like it (theoretically, of course). – Chi_Iroh Jun 21 '23 at 22:39
  • 1
    I think it simply means that compilers always know the storage class of a variable, so even if you don't write a storage class, the compiler remembers it the same as if you typed `auto`, and the compiler frequently has to check if variables are `auto`, so compiler writers think about it a lot even though programmers don't. – user253751 Jun 22 '23 at 22:12
  • 3
    @pmacfarlane That's not really an aside - being able to repurpose a keyword like that means two things: the keyword had no useful, actual syntactical meaning, and no one used it anyway so the repurpose wouldn't break existing code. And from that, it can be inferred `auto` was of no practical use to anyone, compiler writer or otherwise. – Andrew Henle Jun 22 '23 at 23:00
  • 1
    This question happens to be well received, but otherwise *[Retrocomputing](https://retrocomputing.stackexchange.com/tour)* is happy to take such questions. – Peter Mortensen Jun 23 '23 at 14:32
  • 1
    Notwithstanding your "insist" and "clarify" paras, there's two ways of reading this. The first is "Can a compiler writer benefit from being able to use 'auto' in the code he's writing?" The second is "Can a compiler writer benefit from being able to assume that other programmers use 'auto' where appropriate in the code his compiler will be processing?". – Mark Morgan Lloyd Jun 24 '23 at 15:27
  • 1
    @MarkMorganLloyd What do you think about my edit ? Do I need to clarify a bit more ? – Chi_Iroh Jun 24 '23 at 19:29
  • 1
    @Chi_Iroh Frankly I don't think there's very much can- or needs to- be done, I was thinking of commenting earlier and possibly should have. I think the important thing is that the community appears to have reached a consensus, and the bottom line is that when B and C were designed the process of designing a language and compiler was far less understood than it is today, and- of course- the amount of space available for the code and data associated with compile-time inference was limited. – Mark Morgan Lloyd Jun 25 '23 at 06:56

5 Answers5

61

Author answer: I just emailed Mr Van der Linden, and here is what he said:

Yes, I agree with the people who answered on stack overflow. I don't know for certain, because I never used the language B, but it seems highly plausible to me that "auto" ended up in C because it was in B.

Even when I was professionally kernel and compiler programming in C in the 1980's, I never saw any code that I can recall that used "auto".

The key takeaway is that the auto keyword doesn't add any extra information, and thus is redundant and unneeded. It was a mistake to bring it into C!

I also asked for some explanation about what he meant by speaking about compiler writing and symbol table. Here is his response:

Say you are writing a compiler that will translate C source code into linker objects (object files that can be linked).

Whenever your lexer (front end of the compiler) finds a sequence of characters that form a user-defined symbol (might be a variable, might be a function name, might be a constant, etc), the compiler will store that name in a table called the "symbol table". It will also store everything else it knows about the symbol - if it is a variable, it will store its type, if a constant it will store the value, if a function it will note that it can be invoked, etc etc. It will also store the scope of the name (the lines of code in which this symbol is known). The symbol table is one of the core data structures of a compiler, and some of it is carried forward into the object file. The object file needs to know any names that are to be addressable by external code objects, so the linker can associate them the use of a name with the object in which it is stored.

Then later, when the compiler comes across the same name, the compiler looks in the symbol table to see if it knows all about the name already. One of the useful items to store about a name is "where the compiler will allocate storage for it". That storage has to be maintained as long as the symbol remains in scope. So it is useful for the symbol table to know where it should allocate the storage at runtime. I gave 3 examples of different places where a variable might be stored. The "auto" keyword tells the compiler "this is a variable, and you should store this on the stack and its scope is the function it is declared in".

Only, the compiler doesn't need to be told this, because this is already true for all variables declared within a function. I hope this explanation makes sense.

I guess I completely misunderstood his statements by thinking that auto may have some usages when writing a compiler in C, in the code dealing with symbol table, but it seems that he meant auto is useless, but C compiler writers must handle it and understand it. I nevertheless asked him to confirm my mistake, and it was indeed a misunderstanding of mine :

Perhaps the best way to think about this is:

  1. "auto" has no semantic effect in C
  2. we think it came over from B, but don't know for sure.
  3. It conveys info to someone writing a compiler for C code.
  4. But that info is a duplicate of other info that the compile writer has.
  5. So a compiler writer can take note of either piece of info to update the symbol table
  6. Or indeed, they can check that the two pieces of info are consistent, and if not, issue an error message.
Chi_Iroh
  • 1,061
  • 5
  • 14
  • 3
    I think the idea is the compiler mantains the _storage category_ in the symbol table (e.g. static, stack, etc.). With the idea that the mapping from source code to compiler internals is thin, keywords more or less map directly: `static` maps to static, `auto` maps to stack, possibly `register` maps to register, etc. (only, auto is implicit, and register is calculated by the compiler and the keyword ignored). – Pablo H Jun 22 '23 at 18:01
  • 7
    There's a bit more to it. The `auto` keyword exists because the earliest C compilers could not compile without it. Another round of syntax lifting could have removed it but it wasn't done. Just regex-removing `auto` wouldn't have worked because variables were declared `auto x;` within the compiler source itself. You see, `int` was implicit. – Joshua Jun 22 '23 at 18:40
  • @Joshua How much early ? Do you mean pre-ANSI ones ? If yes, is there a way to compile pre-ANSI code today ? It seems impossible with GCC because there's no std flag prior to std=c89. – Chi_Iroh Jun 22 '23 at 22:12
  • 2
    @Chi_Iroh: Early early. We're talking the original PDP-11 Unix compiler. This weird behavior exists because that compiler depended on it. (You can check pcc and see if it still does nor not.) (AFAIK std=c89 can successfully compile all K&R C so no earlier option was needed.) – Joshua Jun 22 '23 at 22:18
  • Ok thank you, I'll definitely take a look about pcc and other old compilers. – Chi_Iroh Jun 22 '23 at 22:52
39

As far as I can tell from 40+ years of C programming, including compiler work, the auto keyword has been completely useless in C for 50 years.

To answer your precise question, Why is auto keyword useful for compiler-writers in C? It isn't useful at all; C compiler writers are just required to parse it as a keyword and implement its semantics as a storage class specifier.

It seems to be a left over from B, the predecessor to the C language, developed by Ken Thompson and Dennis Ritchie at Bell Labs in the late sixties and early seventies. I have never used B and I doubt Peter, whom I met in 1984 at Inria, has either.

Before C23, auto can only be used to specify automatic storage class for definitions in the scope of a function. This is the default, so auto is fully redundant and as long as the type or another qualifier is specified, auto can be removed. There isn't any case where it was needed, so its inclusion in the C Standard is only rooted in the early history of the C language.

auto has been used in C++ since C++11 to enable type inference in variable definitions, with or without automatic storage, where the compiler detects the type from that of the initializer.

With the current trend pushing for convergence on a common subset for the C and C++ languages, new semantics have been attached to this keyword in C23 modelled after the C++ semantics, but more restricted:

6.7.1 Storage-class specifiers

auto may appear with all the others except typedef;

auto shall only appear in the declaration specifiers of an identifier with file scope or along with other storage class specifiers if the type is to be inferred from an initializer.

If auto appears with another storage-class specifier, or if it appears in a declaration at file scope, it is ignored for the purposes of determining a storage duration of linkage. It then only indicates that the declared type may be inferred.

Type inference is specified as:

6.7.9 Type inference

Constraints

1 A declaration for which the type is inferred shall contain the storage-class specifier auto.

Description

2 For such a declaration that is the definition of an object the init-declarator shall have one of the forms

direct-declarator = assignment-expression
direct-declarator = { assignment-expression }
direct-declarator = { assignment-expression , }

The declared type is the type of the assignment expression after lvalue, array to pointer or function to pointer conversion, additionally qualified by qualifiers and amended by attributes as they appear in the declaration specifiers, if any. If the direct declarator is not of the form identifier attribute-specifier-sequenceopt, possibly enclosed in balanced pairs of parentheses, the behavior is undefined.

Type inference is very useful in C++ because types can be very complex and almost impossible to specify in variable definitions, especially with templates. Conversely, using it in C is probably counter productive, lessening code readability and encouraging laziness and error prone practices. It was already bad enough to hide pointers behind typedefs, now you can hide them completely with the auto keyword.


To finish on a less serious note, I remember seeing it used in tricky interview tests, where the candidate is asked to find why this code does not compile:

#include <stdio.h>
#include <string.h>

int main(void) {
    char word[80];
    int auto = 0;
    while (scanf("%79s", word) == 1) {
        if (!strcmp(word, "car")
        ||  !strcmp(word, "auto")
        ||  !strcmp(word, "automobile"))
            auto++;
    }
    printf("cars: %d\n", auto);
    return 0;
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    Answer validated because it fits with everything I saw on the Internet, and I'm not surprised at all that even a so much experienced C programmer hasn't seen an utility for this keyword. And also the fact that you're getting some upvotes strengthens this point. – Chi_Iroh Jun 21 '23 at 21:53
  • 2
    Isn't `auto` having C++ semantics already a done deal in C23? – pmacfarlane Jun 21 '23 at 22:10
  • 3
    @pmacfarlane It doesn't have C++ semantics exactly, but it can be used to infer a type as in `auto x = foo();` in C23. – Ted Lyngmo Jun 21 '23 at 22:15
  • 1
    @Ted I'm no expert in C++ (working on it!), but isn't that basically also how it works in C++? Is there a difference? – pmacfarlane Jun 21 '23 at 22:17
  • 1
    @pmacfarlane That's one use of `auto` that works in both languages. C++ has a lot more of them. Not all uses would not be possible in C and making C++ syntax legal in C for those cases would make header files used by both C and C++ really error prone. – Ted Lyngmo Jun 21 '23 at 22:24
  • 7
    @TedLyngmo: IMHO, It is a sad move to try and converge C toward its distant cousin. – chqrlie Jun 21 '23 at 22:34
  • 2
    @pmacfarlane: I'm afraid you are correct... so much crap went into C23 I missed this one. Answer amended – chqrlie Jun 21 '23 at 22:39
  • 2
    @chqrlie It will spawn a legion of new SO questions where people mess up their types by using `auto` everywhere. It's a very lazy solution that causes more problems than it solves. In saying that, I like it in C++ because it has some very verbose compound types that you don't get in C. – pmacfarlane Jun 21 '23 at 22:43
  • Very fun code, I like it – Chi_Iroh Jun 22 '23 at 08:28
  • 4
    *auto is used in C++ to enable type inference* - For the record, that was new in C++11. In C++ before that, `auto` worked as it does in C before C23, as a storage-class specifier. https://godbolt.org/z/6WeGab6of (and deduced return types for functions only with C++14.) – Peter Cordes Jun 23 '23 at 02:41
  • 1
    @PeterCordes: good point. Answer amended. – chqrlie Jun 23 '23 at 09:45
  • 6
    @pmacfarlane our workplace just instituted a code checker that insists you use `auto` whenever the type can be inferred by context. I really dislike that rule, because sometimes explicit types are useful documentation about what you're working with. – Mark Ransom Jun 24 '23 at 04:40
  • 1
    @MarkRansom: I agree 100%. `auto` is useful in many places in C++, but enforcing it everywhere seems counter productive. In C however, I would support a rule that forbids its usage anywhere. – chqrlie Jun 24 '23 at 10:05
  • @MarkRansom: the blog link in your profile seems broken. – chqrlie Jun 24 '23 at 10:05
  • @chqrlie yes, it's broken. It actually never worked, but I dropped my web provider a couple of years ago so now it's even more broken. – Mark Ransom Jun 24 '23 at 13:04
19

The auto keyword originates from the B language, where it was actually very useful, and allowed compiler to distinguish local names from non-local names (marked with extrn keyword):

main()
{
    extrn printf;
    auto x;
    x = 25;
    printf('%d', x);
}

When the B language evolved into C, it preserved a high degree of backward compatibility. In B there was basically only a single "cell" type, so in C they've introduced type annotations as an optional feature. In C89 and prior, auto had been used for the same purpose of introducing local names:

main()
{
    extern printf();
    auto x; /* type is int by default */
    x = 42;
    printf("%d", x);
}

online compiler

After language focus shifted towards enforcing type safety, the need for the auto specifier evaporated completely, since presence of type annotation allowed to distinguish local name declarations.

Donald Duck
  • 8,409
  • 22
  • 75
  • 99
user7860670
  • 35,849
  • 4
  • 58
  • 84
12

First of all auto is one of 4 or 5 Storage-class specifiers: auto, register, static, extern, and from C11 on _Thread_local. Every variable in C has one associated storage-class specifier from the above list, with auto being the default if not specified.

From a user's perspective, due to auto being the default, it is rarely1 necessary to specify it, and arguably doing so is just noise -- the other specifiers stand out more if no specifier is generally used.

From a compiler writer's perspective, however, since every variable has a storage-class specifier, the concept of auto is paramount, and putting yourself in their shoes, you can imagine that somewhere exists an enum enumerating the 4 (or 5) different specifiers and each variable declaration having one of the enum values attached.

The fact that it appears in the compiler does not require that it appears in the language, but it does provide an argument for it: regularity. The concept exists regardless of whether it's directly exposed (or not) and there is little cost in exposing it, so might as well, no?

1 @BenVoigt mentioned that it may be useful in macros, where the type is user-provided, as it prevents the user from specifying another storage specifier such as static, since the compiler will not accept two storage specifiers.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 3
    Interesting argument, yet there are other concepts for which *regularity* would require more keywords: `static` vs *public* for global symbols, `static` vs *dynamic* for local symbols, `extern` vs *locally defined* symbols, `const` vs *modifiable*... Having a redundant keyword actually goes against one of the C designers' cardinal values: simplicity. It is probable they kept the `auto` keyword for compatibility with ancient code originally written in B as shown in user7860670's answer, where `int` was implicit. – chqrlie Jun 22 '23 at 08:42
  • @Bob__: I did, indeed. – Matthieu M. Jun 22 '23 at 10:48
  • 1
    @chqrlie: Storage-class specifiers are specifically about "variables", by opposition to function: you don't apply `register` to a function. So, with regard to global variables, we're talking `static` vs `extern` (if you want to declare it in a header), and with regard to locally scoped variables, that `static` vs `auto` vs `register`. So for variables it's fairly regular, actually. As for `const` (and `volatile`), it was introduced in ANSI C, and was not present in the K&R version, so it being different is fairly normal. – Matthieu M. Jun 22 '23 at 10:57
  • 6
    There's one case where it is useful in a current-version C program ( > C90 && < C23 ) -- if there's a macro involved. `MAILBOX x;` might be `int`, might be `char`, might be `volatile`, might be `static`. `auto MAILBOX x;` might be `int`, might be `char`, might be `volatile`.... but for certain it is not `static`, and instead of silently broken code you will get a compile error if any future programmer ever tries to add `static` into the #define of `MAILBOX`. – Ben Voigt Jun 22 '23 at 19:23
  • 1
    @BenVoigt: Nice one indeed; I've amended the answer from never to rarely, and mentioned your example. – Matthieu M. Jun 23 '23 at 07:38
0

The auto keyword in C is not very useful to most programmers. However, it can be useful to compiler writers.

The symbol table is a data structure that the compiler uses to keep track of all the variables and functions in a program. When the compiler sees an auto declaration, it knows that the variable will be allocated on the stack. This means that the compiler can optimize the code for that variable, such as by avoiding storing it in a register.

For example, consider the following function:

void soso(int x) {
  int y = x * 2;
  // The compiler could optimize this code if it knew that y was allocated on the stack.
  int z = y + 3;
}

If the compiler knew that y was allocated on the stack, it could avoid storing y in a register. This would save memory and improve the performance of the function.

Of course, the auto keyword is not always necessary to improve the performance of compiler-generated code. However, it can be a useful tool for compiler writers who want to optimize their code.

Here are some additional details about the auto keyword:

The auto keyword is not necessary in C. The compiler will automatically assume that any variable declared inside a function is allocated on the stack. The auto keyword can be used to declare variables outside of functions. However, this is not recommended, as it can make the code more difficult to read and understand. The auto keyword is not available in all C compilers. Some compilers may only support it in certain situations.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • Isn't a register faster than the stack ? I may be wrong but I believe compilers actually move stacked variables into registers whenever they can to speed up the code. – Chi_Iroh Jul 06 '23 at 18:49