10

I'm wondering how does GCC know where the error is (in the source code), when its preprocessor has removed comments? I googled it, but I couldn't find it. I'll explain what I mean:

I have C code like this:

int main(void)
{
  return /* comment */ ) /* another comment */0;
}

There is syntax error at position of ')' character (24th char.). Then I filter it through the GCC preprocessor (gcc -E main.c) and the result is:

# 0 "main.c"
# 0 "<built-in>"
# 0 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2
# 1 "main.c"
int main(void)
{
  return ) 0;
}

Okay, I still understand these steps. Preprocessor has removed comments. But here is the thing. Now the syntax error is at the position of the 10th character (not 24th character), because of the removed comments. So how does it know where exactly that syntax error was? (as we see in the following output)

main.c: In function ‘main’:
main.c:3:24: error: expected expression before ‘)’ token
    3 |   return /* comment */ ) /* another comment */0;
      |                        ^
main.c:3:24: error: expected statement before ‘)’ token

I found that there is something with the #line mark, but in the preprocessor output, there is no such #line thing.

So, what is the magic?

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
Wolf
  • 111
  • 4
  • 7
    The preprocessor isn’t a separate program any more. The compiler keeps track of the position of each token in the source code. – Jonathan Leffler Jul 11 '23 at 10:47
  • 3
    GCC doesn't actually pipe through the preprocessor as early compilers did. It still has access to information that's not emitted when you do `gcc -E`. – Toby Speight Jul 11 '23 at 10:47
  • Okay, so, am I right, when I understand, that GCC using preprocessor only for including and replacing macros? – Wolf Jul 11 '23 at 10:50
  • @EricPostpischil I found it here on stackoverflow , one of the posts. It's `#line`, but I don't really understand it. – Wolf Jul 11 '23 at 10:52

1 Answers1

10

The # line preprocessor directive described in the Stack Overflow question you link to is a standard C directive for setting the compiler’s notion of the current source file and line number. It could be used for conveying this information through preprocessing so that, after preprocessing, the compiler still has information about the origins of lines of code. It may also be used by other tools that process or produce source code, such as YACC or Lex, to provide information about where code found in their output originated in their input files.

However, GCC uses its own non-standard mechanism to convey this and additional information. In the preprocessor output you show, the non-standard directive # 1 "main.c" is essentially equivalent to the standard directive # line 1 "main.c"; both say that the following line came from line 1 of the file “main.c”.

Thus, the origin line information is completely visible in the preprocessor output you show.

However, the GCC form allows additional information. In these lines:

# 1 "/usr/include/stdc-predef.h" 1 3 4
# 0 "<command-line>" 2

the trailing “1 3 4” means this is the start of a new file (1), it comes from a system header so certain warnings should be suppressed (3), and it should be treated as wrapped in an extern "C" block (4). The trailing “2” means it is returning to a prior file after having included another file. (Apparently the inclusion of “/usr/include/stdc-predef.h” resulted in no lines of code, possibly because the file was completely wrapped with a #if#endif pair that was not activated.)

… when its preprocessor has removed comments?

When the GCC preprocess removes comments, it leaves the new-line characters in, so the line spacing remains unchanged. For example, in processing the input:

abc
/* Multiple-line comment
   consisting of
   three lines */
xyz

the preprocessor produces:

abc



xyz

So the output has the same number of lines as the input. So line numbers remain correct after preprocessing. However, column information is not conveyed in this way. Consider this code:

int foo/*comment*/(nuts);

When I compile it with Clang 11.0.0, the error message is:

x.c:1:20: error: a parameter list without types is only allowed in a function
      definition
int foo/*comment*/(nuts);
                   ^

As we can see, the compiler knows the error begins in column 20. However, when I preprocess it with clang -E x.c >x.i and then compile the resulting x.i file, the error message is:

x.c:1:10: error: a parameter list without types is only allowed in a function
      definition
int foo (nuts);
         ^

This demonstrates that the column information is not contained in the preprocessor output. Therefore, we can conclude the compiler maintains this information internally when it is doing both the preprocessing and the compilation. In modern GCC and Clang, preprocessing is integrated into the compilation; it is not actually a separate processing step.

Another way to see that preprocessing is integrated into compilation is to compile this code:

int foo(nuts);
#error "Stop processing."

If preprocessing were a separate step prior to compilation, the #error directive would cause a message to be printed and would cause the process to exit. However, when this is compiled with Clang, the compiler first prints a message about the int foo(nuts) line and then prints the message for the #error line. This shows the preprocessing is intertwined with the compilation; the preprocessing is being done line-by-line in concert with compilation, so the compiler does not reach the #error directive until it has already processed the prior int foo(nuts); line.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312