3

In my C code I have an external symbol, some_symbol. I need to get the address of the memory position just preceding that symbol (&some_symbol-1). This used to work fine in older versions of gcc, but on gcc 12.2.0 with -O2 enabled I get an array-bounds warning:

#include <stdio.h>

extern void *some_symbol;

int main (void) {
    printf ("%p\n",&some_symbol-1);
    return 0;
}
$ cc -Wall -O2 -c x.c -o x.o
x.c: In function ‘main’:
x.c:6:9: warning: array subscript -1 is outside array bounds of ‘void[8]’ [-Warray-bounds]
    6 |         printf ("%p\n",&some_symbol-1);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
x.c:3:14: note: at offset -8 into object ‘some_symbol’ of size 8
    3 | extern void *some_symbol;
      |              ^~~~~~~~~~~

I understand why this is dangerous. But in this case, I'm referring to a symbol from the run time system of another language, and the RTS documents that there is space above this symbol, which in some cases needs to be referenced. Here is an example of what the RTS does:

    .data
some_symbol_name:
    .string "some_symbol\0"

    .text
    .quad   some_symbol_name
    .globl  some_symbol
some_symbol:
    # ...

I can circumvent the problem using uintptr_t:

    printf ("%s\n",*(char**)(&some_symbol-1));
    printf ("%s\n",*(char**)((intptr_t)&some_symbol-sizeof(void*)));

Both of these correctly print some_symbol, but the first gives a compilation warning similar to the one above.

The API of the external system guarantees that there is readable data above some_symbol, but how do I tell this to gcc? Using uintptr_t everywhere is unwieldy.

I know that I can disable the warning (locally), but would prefer not to.

Is there a way to specify in the extern declaration of the symbol that there is space before the symbol that can be referenced?

  • 1
    you cant tell that it is safe as it is not, you need to disable the warning in your command line. – 0___________ Mar 23 '23 at 09:01
  • Are you sure that you need a access memory before the variable `some_symbol`? Or rather you need to access memory before a location that `some_symbol` points to? – tstanisl Mar 23 '23 at 09:28
  • @tstanisl I'm not sure I understand the difference. `some_symbol` is a function entry point, and the RTS of this language stores some metadata (in this case a pointer to the name of the function) in the memory immediately before each function. So eventually I need to dereference `&some_symbol-1` -- but it already goes wrong when I try to compute the pointer. I think that may not answer your question though? –  Mar 23 '23 at 09:34
  • Did you check https://stackoverflow.com/questions/3378560/how-to-disable-gcc-warnings-for-a-few-lines-of-code – Support Ukraine Mar 23 '23 at 09:36
  • @SupportUkraine yes, but that gives me four lines -- then I would prefer the `intptr_t` workaround. But I was really hoping for a way to change the `extern` declaration to tell gcc the size of the object not only before but also after the label. (A similar case is if the label would point to the middle of an array, and you want to get elements from the part of the array before the symbol. I realize it may not be a typical use case.) –  Mar 23 '23 at 09:39
  • I don't think that metadata are attached to `some_symbol`. It could only be achieved with a special linking script. I guess that metadata are stored before the location that `some_symbol` points to. Change declaration to `void ** some_symbol;` and check if "the metadata" are available at `some_symbol[-1]`. – tstanisl Mar 23 '23 at 09:40
  • @tstanisl I have updated to give an assembly example of the external code I'm dealing with. If I'm not mistaken `some_symbol[-1]` would assume that there is a pointer stored at `some_symbol`. This is not the case. `some_symbol` is a function entry point, and I need to read the code/data before that entry point. –  Mar 23 '23 at 10:23

1 Answers1

1

Pointer arithmetic is only well-defined within the bounds of an array. In case of single variables, they are to be regarded as an array of 1 item.

In this case you attempt pointer arithmetic on a single void** so &some_symbol-1 invokes undefined behavior.

Casting to uintptr_t is the only sensible solution. That is: (uintptr_t)&some_symbol - 1. Or if that's for some reason too unwieldy, perhaps you could cook up a function-like macro?

#define get_offset(ptr, n) ((uintptr_t)(ptr) + sizeof(*ptr)*(n))

Usage:

#include <stdio.h>
#include <inttypes.h>

#define get_offset(ptr, n) ((uintptr_t)(ptr) + sizeof(*ptr)*(n))

extern void *some_symbol;

int main (void) {
    printf("%"PRIuPTR "\n", (uintptr_t)&some_symbol);
    printf("%"PRIuPTR "\n", get_offset(&some_symbol, -1));
    return 0;
}
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Thanks. I will probably accept this, but since it's an answer in the negative I'll leave the question open in case somebody does have a solution; I can't verify that it is correct. –  Mar 23 '23 at 12:10
  • @user21463071 Pointer arithmetic out of bounds of a variable is definitely undefined behavior. If you don't believe me you can read all about it in C17 6.5.6 §8. – Lundin Mar 23 '23 at 12:16
  • I believe that, but it isn't clear to me that `some_symbol` cannot be declared in a way that the access is not of bounds. For instance, I could imagine something like `extern char *some_symbol[-1]` to specify the available space before the symbol. Of course, this syntax is nonsensical, but that doesn't mean that there is no other way to specify this. –  Mar 23 '23 at 13:10
  • @user21463071 If you are pointing into some other larger declared object, then sure. Otherwise most embedded system compilers support non-standard arithmetic on raw physical addresses. As for gcc, who knows - its behavior is changed at a whim these days. – Lundin Mar 23 '23 at 14:02