7

The Wikipedia article on the C Preprocessor says:

The language of preprocessor directives is only weakly related to the grammar of C, and so is sometimes used to process other kinds of text files.

How is the language of a preprocessor different from C grammar? What are the advantages? Has the C Preprocessor been used for other languages/purposes?

Can it be used to differentiate between inline functions and macros, since inline functions have the syntax of a normal C function whereas macros use slightly different grammar?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
  • 9
    It says it's "*only* weakly related", meaning it's barely related at all. – David Schwartz Jul 27 '17 at 07:49
  • 4
    The C preprocessor has been used for other purposes. One notable one was as part of the [`imake`](https://en.wikipedia.org/wiki/Imake) system used with X-Windows up to but not including X11R7.0. The preprocessor can't readily be used to differentiate between macros (which it understands) and inline functions (which it has no clues about). – Jonathan Leffler Jul 27 '17 at 07:52

3 Answers3

21

The Wikipedia article is not really an authoritative source for the C programming language. The C preprocessor grammar is a part of the C grammar. However it is completely distinct from the phrase structure grammar i.e. these 2 are not related at all, except that they both understand that the input consists of C language tokens, (though the C preprocessor has the concept of preprocessing numbers, which means that something like 123_abc is a legal preprocessing token, but it is not a valid identifier).

After the preprocessing has been completed and before the translation using the phrase structure grammar commences (the preprocessor directives have by now been removed, and macros expanded and so forth),

Each preprocessing token is converted into a token. (C11 5.1.1.2p1 item 7)


The use of C preprocessor for any other languages is really abuse. The reason is that the preprocessor requires that the file consists of proper C preprocessing tokens. It isn't designed to work for any other languages. Even C++, with its recent extensions, such as raw string literals, cannot be preprocessed by a C preprocessor!

Here's an excerpt from the cpp (GNU C preprocessor) manuals:

The C preprocessor is intended to be used only with C, C++, and Objective-C source code. In the past, it has been abused as a general text processor. It will choke on input which does not obey C's lexical rules. For example, apostrophes will be interpreted as the beginning of character constants, and cause errors. Also, you cannot rely on it preserving characteristics of the input which are not significant to C-family languages. If a Makefile is preprocessed, all the hard tabs will be removed, and the Makefile will not work.

  • 2
    I once tried to use the C preprocessor on Delphi (i.e. Pascal) source code. That didn't work well because Pascal has `'strings'` and not `"strings"`. – Uli Gerhardt Jul 27 '17 at 11:27
2

The preprocessor creates preprocessing tokens, which later are converted in C-tokens.

In general the conversion is quite direct, but not always. For example, if you have a conditional preprocessing directive that evaluates to false as in

#if 0
   comments
#endif

then in comments you can write whatever you want, it will be converted in preprocessing tokens that will never be converted in C-tokens, so like this inside a C source file you can insert non-commented code.

The only link between the language of the preprocessor and C is that many tokens are defined almost the same but not always.

for example, it is valid to have preprocessor numbers (in ISO9899 standard called pp-numbers) like 4MD which are valid preprocessor numbers but not valid C numbers. Using the ## operator you can get a valid C identifier using these preprocessing numbers. For example

#define version 4A
#define name TEST_
#define VERSION(x, y) x##y
VERSION(name, version) <= this will be valid C identifier

The preprocessor was conceived such that to be applicable to any language to make text translation, not having C in mind. In C it is useful mainly to make a clear separation between interfaces and implementations.

alinsoar
  • 15,386
  • 4
  • 57
  • 74
  • 1
    Thanks, I couldn't think of an example where a preprocessing token wouldn't be a C token. Originally it was conceived in that way, but that really hasn't been the case at least since ANSI-C. One correction though, in `#if 0`, the stuff in there still must be preprocessing tokens, so you can't write `you can't` there. – Antti Haapala -- Слава Україні Jul 27 '17 at 10:39
  • @AnttiHaapala you can. The last entry from the official Backus-Naur form from the ISO9899 (Annex A) of the nonterminal `preprocessing-token:` says: `each non-white-space character that cannot be one of the above` In other words, it exist a preprocessing token `pp-other` that can keep anything. I saw this where non-C code was inserted in C files and non-processed by the compiler with #if 0. – alinsoar Jul 27 '17 at 12:06
  • @alinsoar: But see sect 6.4p3: "If a ' or a " character matches the last category, the behavior is undefined." Certainly, gcc complains (non-fatally) about loose apostrophes inside `#if 0` sections. – rici Jul 28 '17 at 05:08
0

Conditionals in the C preprocessor are valid C expressions so the link between the preprocessor and the C language proper is intimate.

#define A (6)
#if A > 5
Here is a 6
#elif A < 0
# error
#endif

This expands to meaningless C, but may be meaningful text.

Here is a 6

Though the expnded text is invalid C, the preprocessor uses features of C to expand the correct conditional lines. The C standard defines this in terms of the constant expression:

From the C99 standard §6.6:

6.10.1 Conditional inclusion

Preprocessing directives of the forms

# if constant-expression new-line group opt

# elif constant-expression new-line group opt

check whether the controlling constant expression evaluates to nonzero.

And here is the definition of a constant-expression

6.6 Constant expressions

Syntax:

constant-expression:
   conditional-expression

Description A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be.

Constraints Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.

Each constant expression shall evaluate to a constant that is in the range of representable values for its type.

Given the above, it's clear that the preprocessor requires a limited form of C language expression evaluation to work, and therefore knowledge of the C typesystem, grammar, and expression semantics.

Community
  • 1
  • 1