I am working on a project where I need to create the address range of many, many global variables in C (C++ not possible), with clang. For symbols of complete types, this is easy in a standard-compliant way:
typedef struct range {
void* begin;
void* end;
} range;
extern int foo;
range foo_range = { &(&foo)[0], &(&foo)[1] };
But as I said, it works because the C compiler statically knows the size of foo
, so it's able to resolve &(&foo)[1]
as foo+4 bytes (assuming that sizeof(int)
is 4, of course). This won't work for symbols of an incomplete type:
struct incomplete;
struct trailing_array {
int count;
int elements[];
};
extern int foo[];
extern struct incomplete bar;
extern struct trailing_array baz;
range foo_range = { &(&foo)[0], &(&foo)[1] };
// error: foo has incomplete type
range bar_range = { &(&bar)[0], &(&bar)[1] };
// error: bar has incomplete type
range bar_range = { &(&baz)[0], &(&baz)[1] };
// this one compiles, but the range excludes the elements array
However, it's not a problem for me to describe these symbols some more. For instance, I can easily add metadata:
// foo.h
extern int foo[];
extern size_t foo_size;
// foo.c
int foo[] = {1,2,3};
size_t foo_size = sizeof(foo);
Except that this won't help my problem for references outside of foo.c
, because foo_size
is not a compile-time constant, and therefore this wouldn't work:
range foo_range = { &foo, (void*)&foo + foo_size };
// error: foo_size not a compile-time constant
What would work, however, is getting the address of a symbol that ends right where my object ends. For instance, if I define foo
with this assembly code:
_foo:
.long 1
.long 2
.long 3
_foo_end:
Then, in my C code, I can have:
extern int foo[];
extern int foo_end;
range foo_range = { &foo, &foo_end };
and that effectively solves my problem.
However, while I have the flexibility to add symbols, I don't have the flexibility to rewrite every global declaration as a file-level assembly statement. So, my question is: what is the closest that I can get to that using clang?
- I know that I can use sections (since the linker makes start and end symbols for sections), but one section per global variable would be way overkill.
- I know that I can't just take the address of a variable immediately after the global whose range I want to get, because the compiler has been known to reorder globals in some cases.
I'm specifically using Apple's linker, but if you have a solution that works for GNU ld/gold or lld, I'll still take it and see if I can get it to work here too.