9

According to the C standard, if a program defines or declares a reserved identifier, the behavior is undefined. One category of reserved identifiers is identifiers with external linkage defined in the C standard library.

For example of a program with undefined behavior, consider the following: file1.c defines a variable named time with external linkage, which conflicts with the time function from the standard library, declared in time.h.

file1.c:

int time;

int foo( void )
{
    return time;
}

file2.c:

#include <time.h>
#include <stdio.h>

extern int foo( void );

int main( void )
{
    foo();
    printf( "current time = %ld\n", time( NULL ) );
    return 0;
}

When the program is compiled and run, a seg fault occurs, because the time symbol referenced in file2.c gets linked to the time variable from file1.c, rather than the function in the C library.

$ gcc -c -o file1.o file1.c
$ gcc -c -o file2.o file2.c
$ gcc -o test file1.o file2.o 
$ ./test
Segmentation fault (core dumped)

I'm wondering if there is any way for GCC to detect the usage of conflicting, reserved identifiers in user code, at compile or link time. Here's my motivation: I'm working on an application where users can write C extensions to the application, which get compiled and linked to the rest of the application. If the user's C code uses reserved identifiers like the example above, the resulting program can fail in hard-to-predict ways.

One solution which comes to mind is to run something like nm on the user's object files, and compare the defined symbols against a list of reserved identifiers from the C library. However, I am hoping to find something in GCC which can detect the issue. Does anyone know if that is possible, or have any suggestions?

rkuczwara
  • 93
  • 3
  • Good question. Indeed GCC is able to warn about many identifiers, especially with implicit declaration not matching the standard, but not about this. – Antti Haapala -- Слава Україні Oct 26 '18 at 15:24
  • You could start with disallowing global variables. gcc should be able to detect collisions with the standard library in the same translation unit. – Lundin Oct 26 '18 at 15:37
  • 1
    or make a simple parser that check that the user's exported symbol follow a simple syntax which would disable any shadowing. Eg. enforce that all of the user's global symbols be prefixed with `usr_` or something – TDk Oct 26 '18 at 15:38
  • This isn't really anything the compiler can give errors about, since it only deals with [translation units](https://en.wikipedia.org/wiki/Translation_unit_(programming)). The linker might be able to do it, but with C a symbol is just a symbol, there's really no semantic information about symbols available for the linker to be able to detect it either. – Some programmer dude Oct 26 '18 at 15:48
  • @Someprogrammerdude: A compiler certainly could report errors for attempts to define reserved identifiers in non-implementation source files (that is, “user” source files), and compilers do not “only deal with translation units.” The C standard specifies many things in terms of translation units, but compilers are not required to slavishly follow that. Clang (and I think GCC) already handles user header files differently from system or implementation header files (it will, at least depending on switches, issue warnings for things in user header files that it will not in system header files). – Eric Postpischil Oct 26 '18 at 17:01
  • 1
    Another option: write all of your code as library code, properly namespacing all exported identifiers to a namespace prefix and the problem no longer exists. – Petr Skocik Oct 26 '18 at 17:28
  • Thanks for the suggestions y'all. I think the suggestion of @PSkocik to pre-include 'all-of-c' is the way to go for me, since I can add the -include argument to the compiler command. The other suggestions like forbidding globals or requiring globals to have a prefix requires changing the behavior of people who write the 'user code', which I'm trying to avoid. – rkuczwara Oct 26 '18 at 21:29

2 Answers2

4

You could grab a libc implementation that you can link statically and with -Wl,--whole-archive and try and slap it onto your object files.

main.c:

int time=42;
int main(){}

link it with a whole libc:

$ musl-gcc main.c -static -Wl,--whole-archive

If you get a multiple definition error or a type/size/alignment of symbol changed warning, you're clashing with your libc.

/usr/local/bin/ld: /usr/local/musl/lib/libc.a(time.lo): in function `time':
/home/petr/f/proj/bxdeps/musl/src/time/time.c:5: multiple definition of `time'; /tmp/cc3bL3pP.o:(.data+0x0): first defined here

Alternatively (and more robustly) you could preinclude and all-of-C (all-of-posix) header and have the compiler tell you about where you're clashing with it (I'd do it just once in a while, otherwise it's going to somewhat pessimize your build times. (Although even including all of POSIX generally isn't as bad as including even a single C++ header)).

Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • 1
    1) From time to time, I too found including all `#include`s informative. 2) Learned a new word today: [pessimize](https://en.wiktionary.org/wiki/pessimize). – chux - Reinstate Monica Oct 26 '18 at 17:07
  • 1
    @chux For a while I preincluded a precompiled whole-of-c-and-posix header and didn't even bother with selectively including system stuff. Compared to the cost of even "cheap" C++ includes (e.g., about 300ms to process on my PC), an all-of-posix/all-of-c is quite cheap (~40ms on my PC). – Petr Skocik Oct 26 '18 at 17:25
3

I'm wondering if there is any way for GCC to detect the usage of conflicting, reserved identifiers in user code, at compile or link time.

Detail to @PSkocik good answer.
One way to detect many conflicts is to include all headers files. Compilation times may noticeable increase.

Determine version

#if defined(__STDC__)
# define STANDARD_C89
# if defined(__STDC_VERSION__)
#  define STANDARD_C90
#  if (__STDC_VERSION__ >= 199409L)
#   define STANDARD_C95
#  endif
#  if (__STDC_VERSION__ >= 199901L)
#   define STANDARD_C99
#  endif
#  if (__STDC_VERSION__ >= 201112L)
#   define STANDARD_C11
#  endif
#  if (__STDC_VERSION__ >= 201710L)
#   define STANDARD_C18
#  endif
# endif
#endif

Include them, some selectively.

#include <assert.h>
//#include <complex.h>
#include <ctype.h>
#include <errno.h>
//#include <fenv.h>
#include <float.h>
//#include <inttypes.h>
//#include <iso646.h>
#include <limits.h>
#include <locale.h>
#include <math.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
//#include <stdalign.h>
//#include <stdatomic.h>
//#include <stdbool.h>
#include <stddef.h>
//#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
//#include <stdnoreturn.h>
#include <string.h>
//#include <tgmath.h>
//#include <threads.h>
#include <time.h>
//#include <uchar.h>
//#include <wchar.h>
//#include <wctype.h>

//////////////////////////////
#ifdef STANDARD_C95
#include <iso646.h>
#include <wchar.h>
#include <wctype.h>
#endif

//////////////////////////////
#ifdef STANDARD_C99
#ifndef __STDC_NO_COMPLEX__
#include <complex.h>
#endif
#include <fenv.h>
#include <inttypes.h>
#include <stdbool.h>
#include <stdint.h>
#include <tgmath.h>
#endif

//////////////////////////////
#ifdef STANDARD_C11
#include <stdalign.h>
#ifndef __STDC_NO_THREADS__
#include <stdatomic.h>
#include <threads.h>
#endif
#include <stdnoreturn.h>
#include <uchar.h>
#endif

I am certain the above needs some refinements and would appreciate advice on that.


To avoid additions to the name space, instead of code like #define STANDARD_C11, use macro code tests

// #ifdef STANDARD_C11
//  ... C11 includes
// #endif

#if defined(__STDC__)
# if defined(__STDC_VERSION__)
#  if (__STDC_VERSION__ >= 201112L)
     ... C11 includes
#  endif
# endif
#endif

Although the goal is "According to the C standard ...", additional code may be needed to accommodate popular compiler extensions and slight variations from the standard.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • I'd drop the STANDARD_* macros and replace them with inline tests against __STDC__ (to not pollute the namespace unecessarily). For real maximum portability, it's also good to couple it with a configuration test system (like autotools) and additionally guard each include (because e.g., tinycc doesn't have iso646.h). (It's even more important if you want to include all of posix and have it work on Linux/Cygwin/Mac and various compilers). – Petr Skocik Oct 26 '18 at 18:44
  • @PSkocik Hmmm, if TinyCC uses 199901L yet does not have iso646.h, is it compliant? – chux - Reinstate Monica Oct 26 '18 at 18:56
  • Not 100%. (So maybe I shouldn't care about it but I do, because I like the speed :D)). – Petr Skocik Oct 26 '18 at 19:09