9

People always say that macros are unsafe, and also that they are not (directly) type-checking on their arguments, and so on. Worse: when errors occur, the compiler gives intrincate and incomprehensible diagnostics, because the macro is just a mess.

Is it possible to use macros in almost the same way as a function, by having safe type-checking, avoiding typical pitfalls and in a way that the compiler gives the right diagnostic.

  1. I am going to answer this question (auto-answering) in an affirmative way.
  2. I want to show you the solutions that I've found to this problem.
  3. The standard C99 will be used and respected, to have a uniform background.
  4. But (obviously there is a "but"), it will "define" some kind of "syntax" that people would have to "eat".
  5. This special syntax intends to be the simplest to write as much as the easiest to understand and/or handle, minimizing the risks of ill formed programs, and more importantly, obtaining the right diagnostic messages from the compiler.
  6. Finally, it will study two cases: "non-returning value" macros (easy case) and "returning-value" macros (not-easy, but more interesting case).

Let us quickly remember some typical pitfalls produced by macros.

Example 1

#define SQUARE(X) X*X
int i = SQUARE(1+5);

Intended value of i: 36. True value of i: 11 (with macro expansion: 1+5*1+5). Pitfall!

(Typical) Solution (Example 2)

#define SQUARE(X) (X)*(X)
int i = (int) SQUARE(3.9);

Intended value of i: 15. True value of i: 11 (after macro expansion: (int) (3.9)*(3.9)). Pitfall!

(Typical) Solution (Example 3)

#define SQUARE(X) ((X)*(X))

It works fine with integers and floats, but it is easily broken:

int x = 2;
int i = SQUARE(++x);

Intended value of i: 9 (because (2+1)*(2+1)...). True value of i: 12 (macro expansion: ((++x)*(++x)), which gives 3*4). Pitfall!

A nice method for type-checking in macros can be found here:

However I want more: some kind of interface or "standard" syntax, and a (small) number of easy-to-remember rules. The intent is "be able to use (not to implement)" macros as similar to functions as possible. That means: well written fake-functions.

Why is that interesting in some way?

I think that is an interesting challenge to achieve in C.

Is it useful?

Edit: In standard C is not possible to define nested functions. But, sometimes, one would prefer to be able to define short (inline) functions nested inside other ones. Thus, a function-like prototyped macro would be a possibility to take in account.

Community
  • 1
  • 1
pablo1977
  • 4,281
  • 1
  • 15
  • 41
  • You might want to consider GCC's extension [statement expressions](http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html) — or there again, you might not. However, I submit that if macros were usable everywhere, then statement expressions would not have been invented. – Jonathan Leffler Aug 26 '13 at 16:58
  • Consider providing reasons why you require this functionality in a macro, instead of simply using an inline function. – tomlogic Aug 26 '13 at 22:26
  • @JonathanLeffler Actually, you are right. To see why, see the new Section 5 that I've added to the answer. There it is explained an example that Chris Dodd used to break my PRINTINT_SQUARE(X) macro. – pablo1977 Aug 26 '13 at 22:33
  • @tomlogic It is a good question. In this moment an idea came to me: In GCC we have nested functions, but under the strict C99 standard we don't. Sometimes one would be happy with an inline function F2() declared just **inside** another function F1(), but we cannot define it "there". But we could define a macro F2() inside the body of F1(). In addition, F2() would be visible only in the scope of F1(). – pablo1977 Aug 26 '13 at 22:49

2 Answers2

7

This answer is divided in 4 sections:

  1. Proposed solution for block macros.
  2. A brief summary of that solution.
  3. Macro-prototype syntax is discussed.
  4. Proposed solution for function-like macros.
  5. (Important update:) Broking my code.

(1.) 1st case. Block macros (or non-returning value macros)

Let us consider easy examples first. Suppose that we need a "command" that prints the square of integer numbers, followed by '\n'. We decided to implement it with a macro. But we want the argument to be verified by the compiler as an int. We write:

#define PRINTINT_SQUARE(X) {    \
   int x = (X);              \
   printf("%d\n", x*x);      \
}
  • The parentheses surrounding (X) avoid almost all pitfalls.
  • Moreover, the parentheses help the compiler to properly diagnose syntax errors.
  • The macro parameter X is invoked only once inside the macro. This avoids the pitfall of Example 3 of the question.
  • The value of X is immediately held in the variable x.
  • In the rest of the macro, we use the variable x instead X.
  • [Important Update:] (This code can be broken: see section 5).

If we systematize this discipline, the typical problems of macros will be avoided.
Now, something like this correctly prints 9:

int i = 3;
PRINTINT_SQUARE(i++);  

Obviously this approach could have a weak point: the variable x defined inside the macro could have conflicts with other variables in the program also called x. This is a scope issue. However, it's not a problem since the macro-body has been written as a block enclosed by { }. This is enough to handle every scope-issue, and every potential problem with the "inner" variables x is tackled.

It could be argued that the variable x is an extra object and maybe not desired. But x has (only) temporary duration: it is created at the beginning of the macro, with the opening {, and it is destroyed at the end of the macro, with the closing }. In this way, x it is working as a function parameter: a temporal variable is created to hold the value of the parameter, and it is finally discarded when the macro "returns". We are not committing any sin that functions have not done yet!

More important: when the programmer attempts to "call" the macro with a wrong parameter, the compiler gives the same diagnostic that a function would give under the same situation.

So, it seems every macro pitfall has been solved!

However, we have a little syntactical issue, as you can see here:

Therefore, it is imperative (I say) to add a do {} while(0) construct to the block-like macro definition:

#define PRINTINT_SQUARE(X) do {    \
   int x = (X);              \
   printf("%d\n", x*x);      \
} while(0)

Now, this do { } while(0) stuff works fine, but it is anti-aesthetical. The problem is that it has no intuitive meaning for the programmer. I suggest the use of a meaningful approach, like this:

#define xxbeg_macroblock do {
#define xxend_macroblock } while(0)
#define PRINTINT_SQUARE(X)        \
  xxbeg_macroblock             \
       int x = (X);            \
       printf("%d\n", x*x);    \
  xxend_macroblock

(The inclusion of } in xxend_macroblock avoids some ambiguity with while(0)). Of course, this syntax is not safe anymore. It has to be carefully documented to avoid misuses. Consider the following ugly example:

{ xxend_macroblock printf("Hello");

(2.) Summarizing

Block-defined macros that do not return values can behave like functions if we write them by following the disciplined style:

#define xxbeg_macroblock do {
#define xxend_macroblock } while(0)

#define MY_BLOCK_MACRO(Par1, Par2, ..., ParN)     \
  xxbeg_macroblock                         \
       desired_type1 temp_var1 = (Par1);   \
       desired_type2 temp_var2 = (Par2);   \
       /*   ...        ...         ...  */ \
       desired_typeN temp_varN = (ParN);   \
       /* (do stuff with objects temp_var1, ..., temp_varN); */ \
  xxend_macroblock
  • A call to the macro MY_BLOCK_MACRO() is a statement, not an expression: there is no "return" value of any kind, not even void.
  • The macro parameters must be used just once, at the beginning of the macro, and pass their values to actual temporary variables with block-scope. In the rest of the macro, only these variables may be used.

(3.) Can we provide an interface for the parameters of the macro?

Although we solved the problem of type-checking of parameters, the programmer cannot figure out what type the parameters "have". It is necessary to provide some kind of macro prototype! This is possible, and very safely, but we have to tolerate a little tricky syntax and some restrictions, also.

Can you figure out what the following lines do?

xxMacroPrototype(PrintData, int x; float y; char *z; int n; );
#define PrintData(X, Y, Z, N) { \
    PrintData data = { .x = (X), .y = (Y), .z = (Z), .n = (N) }; \
    printf("%d %g %s %d\n", data.x, data.y, data.z, data.n); \
  }
PrintData(1, 3.14, "Hello", 4);
  • The 1st line "defines" the prototype for the macro PrintData.
  • Below, the function-like macro PrintData is declared.
  • The 3rd line declares a temporal variable data which collects all the arguments of the macro, at once.
  • This step requires to be manually written with care by the programmer...but it is an easy syntax, and the compiler rejects (at least) the parameters assigned to temporary variables with the wrong type.
  • (However, the compiler will be silent about the "reversed" assignment .x = (N), .n = (X)).

To declare a prototype, we write xxMacroPrototype with 2 arguments:

  1. The name of the macro.
  2. The list of types and names of "local" variables that will be used inside the macro. We will call to this items: pseudoparameters of the macro.

    • The list of pseudoparameters has to be written as a list of type-variable pairs, separated (and ended) by semicolons (;).

    • In the body of the macro, the first statement will be a declaration of this form:
      MacroName foo = { .pseudoparam1 = (MacroPar1), .pseudoparam2 = (MacroPar2), ..., .pseudoparamN = (MacroParN) }

    • Inside the macro, the pseudoparameters are invoked as foo.pesudoparam1, foo.pseudoparam2, and so on.

The definition of xxMacroPrototype() is as follows:

#define xxMacroPrototype(NAME, ARGS) typedef struct { ARGS } NAME

Simple, isn't it?

  • The pseudoparameters are implemented as a typedef struct.
  • It is guaranteed that ARGS is a list of type-identifier pairs that is well constructed.
  • It is guaranteed that the compiler will give understandable diagnostics.
  • The list of pseudoparameters has the same restrictions than a struct declaration. (For example, variable-size arrays only can be at the end of the list). (In particular, it is recommended to use pointer-to instead of variable-size array declarators as pseudoparameters.)
  • It is not guaranteed that NAME is a real macro-name (but this fact is not too relevant).
    What matters is that we know that some struct-type has been defined "there", associated to the parameter-list of a macro.
  • It is not guaranteed that the list of pseudoparameters, provided by ARGS actually coincides in some way with the list of arguments of the real macro.
  • It is not guaranteed that a programmer will use this correctly inside the macro.
  • The scope of the struct-type declaration is the same as the point where the xxMacroPrototype invocation is done.
  • It is recommended practice to put together the macro prototype immediately followed by the corresponding macro definition.

However, it is easy to be disciplined with that kind of declarations, and it is easy to the programmer to respect the rules.

Can a block-macro 'return' a value?

Yes. Actually, it can retrieve as many values as you want, by simply passing arguments by reference, as scanf() does.

But you probably are thinking of something else:

(4.) 2nd case. Function-like macros

For them, we need a little different method to declare macro-prototypes, one that includes a type for the returned value. Also, we'll have to learn a (not-hard) technique that let us to keep the safety of block-macros, with a return value having the type we want.

The typechecking of arguments can be achieved as shown here:

In block-macros we can declare the struct variable NAME just inside the macro itself,
thus keeping it hidden to the rest of the program. For function-like macros this cannot be done (in standard C99). We have to define a variable of type NAME before any invocation of the macro. If we are ready to pay this price, then we can earn the desired "safe function-like macro", with returning values of a specific type.
We show the code, with an example, and then we comment it:

#define xxFuncMacroPrototype(RETTYPE, MACRODATA, ARGS) typedef struct { RETTYPE xxmacro__ret__; ARGS } MACRODATA

xxFuncMacroPrototype(float, xxSUM_data, int x; float y; );
xxSUM_data xxsum;
#define SUM(X, Y) ( xxsum = (xxSUM_data){ .x = (X), .y = (Y) }, \
    xxsum.xxmacro__ret__ = xxsum.x + xxsum.y, \
    xxsum.xxmacro__ret__)

printf("%g\n", SUM(1, 2.2));

The first line defines the "syntax" for function-macro prototypes.
A such prototype has 3 arguments:

  1. The type of the "return" value.
  2. The name of the "typedef struct" used to hold the pseudoparameters.
  3. The list of pseudoparameters, separated (and ended) by semicolon (;).

The "return" value is an additional field in the struct, with a fixed name: xxmacro__ret__.
This is declared, for safety, as the first element in the struct. Then the list of pseudoparameters is "pasted".

When we use this interface (if you let me call it this way), we have to follow a series of rules, in order:

  1. Write a prototype declaration giving 3 paramenters to xxFuncMacroPrototype() (the 2nd line of the example).
  2. The 2nd parameter is the name of a typedef struct that the macro itselfs builds, so you have not worry about, and just use it (in the example this type is xxSUM_data).
  3. Define a variable whose type is simply that struct-type (in the example: xxSUM_data xxsum;).
  4. Define the desired macro, with the appropriate number of arguments: #define SUM(X, Y).
  5. The body of the macro must be surrounded by parenthesis ( ), in order to obtain an EXPRESSION (thus, a "returning" value).
  6. Inside this parenthesis, we can separate a long list of operations and function calls by using comma operators (,).
  7. The first operation we need is to "pass" the arguments X, Y, of the macro SUM(X,Y), to the global variable xxsum. This is done by:

xxsum = (xxSUM_data){ .x = (X), .y = (Y) },

Observe that an object of type xxSUM_data is created in the air with the aid of compound literals provided by C99 syntax. The fields of this object are filled by reading the arguments X, Y, of the macro, just once, and surrounded by parenthesis, for safety.
Then we evaluate a list of expressions and functions, all of them separated by comma operators (,).
Finally, after the last comma, we just write xxsum.xxmacro__ret__, which is considered as the last term in the comma expression, and thus is the "returning" value of the macro.

Why all that stuff? Why a typedef struct? To use a struct is better than use individual variables, because the information is packed all in one object, and the data keep hidden to the rest of the program. We don't want to define "a lot of variables" to hold the arguments of each macro in the program. Instead, by defining systematically typedef struct associated to a macro, we have a more easy to handle such macros.

Can we avoid the "external variable" xxsum above? Since compound literals are lvalues, one can believe that this is possible.
In fact, we can define this kind of macros, as shown in:

But in practice, I cannot find the way to implement it in a safe way.
For example, the macro SUM(X,Y) above cannot be implemented with this method only.
(I tried to make some tricks with pointer-to-struct + compound literals, but it seems impossible).

UPDATE:

(5.) Broking my code.

The example given in Section 1 can be broken this way (as Chris Dodd showed me in his comment, below):

int x = 5;          /* x defined outside the macro */
PRINTINT_SQUARE(x);

Since inside the macro there is another object named x (this: int x = (X);, where X is the formal parameter of the macro PRINTINT_SQUARE(X)), what is actually "passed" as argument is not the "value" 5 defined outside the macro, but another one: a garbage value.
To understand it, let us unroll the two lines above after macro expansion:

int x = 5;
{ int x = (x); printf("%d", x*x); }

The variable x inside the block is initialized... to its own undetermined value!
In general, the technique developed in sections 1 to 3 for block macros can be broken in a similar way, while the struct object we use to hold the parameters is declared inside the block.

This shows that this kind of code can be broken, so it is unsafe:

Don't try to declare "local" variables "inside" the macro to hold the parameters.

  • Is there a "solution"? I answer "yes": I think that, in order to avoid this problem in the case of block macros (as developed in sections 1 to 3), we have to repeat what we did for function-like macros, that is: to declare the holding-parameters struct outside the macro, just after the xxMacroPrototype() line.

This is less ambitious, but anyway it responses the question: "How much is it possible to...?". On the other hand, now we follow the same approach for the two cases: block and function-like macros.

Community
  • 1
  • 1
pablo1977
  • 4,281
  • 1
  • 15
  • 41
  • 1
    While a compiler writer can use macro names beginning with double-underscore, mere mortals using a compiler are not supposed to do so; the names are reserved for the implementation. So, all the double-underscore macros above should be renamed without double-underscores before it is safe for a regular programmer (not the compiler writer) to use. – Jonathan Leffler Aug 25 '13 at 05:34
  • I've changed all the '__' to 'xx'. – pablo1977 Aug 25 '13 at 15:52
  • 1
    I've rescanned this answer...and I'm a little confused. It starts with an agenda of 4 points, but doesn't seem to have sections that correspond to those 4 points. You have two chunks, one for 'macros that do not return a value' and one for 'macros that do return a value', which are mentioned in the question. I think you need to think about some level 2 (`##`) headings, and review the order of presentation. – Jonathan Leffler Aug 26 '13 at 03:27
  • 1
    Also, you should be aware that your `PRINTF_SQUARE` macro has two limitations: (1) it only requires the argument to be convertible to `int`, and (2) it cannot be used in every context that a true `void printf_square(int x);` function can be used (for example, with a comma operator, in components of `if`, `for`, `while` and `switch` statements (not that it is good style to do use such functions in most of those places, but...). I've not followed all the links yet. I'll also observe that `-Wshadow` has problems with the nested `x`, but that is not a default compilation option with GCC. – Jonathan Leffler Aug 26 '13 at 03:57
  • @JonathanLeffler I've numbered all the section headings. It seems this little change improves a lot the structure of presentation. Thanks by all the corrections. – pablo1977 Aug 26 '13 at 04:26
  • @JonathanLeffler About the _int_ paramenter of PRINTF_SQUARE: it was not the best choice, maybe, but I think that a printing example would be easy to understand. The point is to ilustrate the parameter-checking in a macro. More generality (of types) is not intended. About your obs. (2), I wrote a warning at the end of Section 2: block-macros are _statements_ and not _expressions_. I've ommitted some details (because the text is too long). However, your examples are clarifying. They could be added if you want. About compilers, I've used GCC with -std=c99. – pablo1977 Aug 26 '13 at 05:01
  • @JonathanLeffler I realized that you are right about PRINTF_SQUARE. So, the macro has to print any type, or well it has to be clear that only ints will be printed. I decided to change the name of the macro to PRINTINT_SQUARE, since I want to show how to obtain a concrete type for the parameter. – pablo1977 Aug 26 '13 at 16:40
  • @ChrisDodd You are totally right. I would have to improve the text, explaining what happens with your example. With the `typedef struct` technique explained later in the answer, the problem you are pointing it is avoided... for a while. It is still present. I will work in a fix. – pablo1977 Aug 26 '13 at 18:09
  • For 'local' variables, you can use three macros: `#define E(x, y) x ## y` and `#define W(x, y) E(x, y)` and `#define V(x) W(x, __LINE__)`. You can make the `E` macro generate more complex names as desired. In your function-like macros, you can then use `V(x)` in place of `x` and the name will be suffixed with the line number too. In a multi-line macro, the same line number will be used for the whole macro, as required by this workaround. – Jonathan Leffler Aug 26 '13 at 23:13
  • @JonathanLeffler I like your idea. It seems interesting. But, although improbable, it is potentially breakable, too. A "global" variable could have a name equal to V(x). I tried your code by defining `int x18 = 4;` and then (in line 18) by invoking `PRINTINT_SQUARE(x18);`, and the Chris Dodd's situation occurs again, as expected. – pablo1977 Aug 27 '13 at 00:02
  • Agreed — that's why you'd probably make E(x,y) more complex: `#define E(x, y) tmp_ ## x ## _ ## y ## _pmt` perhaps. Now collisions are 'enemy action' (not ['happenstance'](http://en.wikiquote.org/wiki/Ian_Fleming) or 'coincidence'). – Jonathan Leffler Aug 27 '13 at 00:09
  • Can you see this being extended to variadic macros? In particular, those that are similar to `printf` in that as the tail of the parameter list they take an optional format string, and parameters of any type to match the format string? E.g. a macro that might be used like `return Foo(valueToReturn, formatString, param1_int, param2_float)` that would return `valueToReturn` but also, say, display a formatted string? – davidA Nov 25 '15 at 03:11
2

While the self answered technique for a function like macro is clever, it does not provide the "generality" of the original "unsafe" macro, since it will not allow arbitrary types to be passed in. And, once the macro is resigned to only work for a specific type, then it is simpler, safer, and easier to maintain an inline function instead.

inline float sum_f (float x, float y) { return x + y; }

With C.11, you can use the new generic selection operator _Generic to define a macro that can call the appropriate inline function given the type of the arguments. The type selection expression (the first argument to _Generic) is used to determine the type, but the expression itself is not evaluated.

#define SUM(X, Y) \
    _Generic ( (X)+(Y) \
             , float : sum_f(X, Y) \
             , default : sum_i(X, Y) )
jxh
  • 69,070
  • 8
  • 110
  • 193