Merging global arrays at link time / filling a global array from multiple compilation units

Question

I want to define an array of things, like event handlers. The contents of this array is completely known at compile time, but is defined among multiple compilation units, distributed amongst multiple libraries that are fairly decoupled, at least until the final (static) link. I'd like to keep it that way too - so adding or deleting a compilation unit will also automatically manage the event handler without having to modify a central list of event handlers.

Here's an example of what I'd like to do (but does not work).

central.h:

typedef void (*callback_t)(void);

callback_t callbacks[];

central.c:

#include "central.h"

void do_callbacks(void) {
    int i;
    for (i = 0; i < sizeof(callbacks) / sizeof(*callbacks); ++i)
        callbacks[i]();
}

foo.c:

#include "central.h"

void callback_foo(void) { }

callback_t callbacks[] = {
    &callback_foo
};

bar.c:

#include "central.h"

void callback_bar(void) { }

callback_t callbacks[] = {
    &callback_bar
};

What I'd like to happen is to get a single callbacks array, which contains two elements: &callback_foo and &callback_bar. With the code above, there's obviously two problems:

The callbacks array is defined multiple times.
sizeof(callbacks) isn't known when compiling central.c.

It seems to me that the first point could be solved by having the linker merge the two callbacks symbols instead of throwing an error (possibly through some attribute on the variable), but I'm not sure if there is something like that. Even if there is, the sizeof problem should somehow also be solved.

I realize that a common solution to this problem is to just have a startup function or constructor that "registers" the callback. However, I can see only two ways to implement this:

Use dynamic memory (realloc) for the callbacks array.
Use static memory with a fixed (bigger than usually needed) size.

Since I'm running on a microcontroller platform (Arduino) with limited memory, neither of these approaches appeal to me. And given that the entire contents of the array is known at compile time, I'm hoping for a way to let the compiler also see this.

I've found this and this solution, but those require a custom linker script, which is not feasible in the compilation environment I'm running (especially not since this would require explicitely naming each of these special arrays in the linker script, so just having a single linker script addition doesn't work here).

This solution is the best I found so far. It uses a linked list that is filled at runtime, but uses memory allocated statically in each compile unit seperately (e.g. a next pointer is allocated with each function pointer). Still, the overhead of these next pointers should not be required - is there any better approach?

Perhaps having a dynamic solution combined with link-time optimization can somehow result in a static allocation?

Suggestions on alternative approaches are also welcome, though the required elements are having a static list of things, and memory efficiency.

Furthermore:

Using C++ is fine, I just used some C code above for illustrating the problem, most Arduino code is C++ anyway.
I'm using gcc / avr-gcc and though I'd prefer a portable solution, something that is gcc only is also ok.
I have template support available, but not STL.
In the Arduino environment that I use, I have not Makefile or other way to easily run some custom code at compiletime, so I'm looking for something that can be entirely implemented in the code.

Are you looking for a platform-*independent* solution to this, or was there a specific platform you had in mind? If you're throwing independence out the door, all kinds of oddities are exploitable (like MS's linker alphabetizing its section names, one of my personal favorites). Otherwise you're probably better off with an init'er entry point (or the link you provide, which honestly is pretty slick). — WhozCraig, Jun 18 '14 at 10:44
Have you considered building the callbacks array as part of your makefile? — Sergey L., Jun 18 '14 at 11:12
You have tagged this both, C and C++, but a good answer for one will likely be a bad answer for the other. Maybe just stick to one of them. — PlasmaHH, Jun 18 '14 at 11:51
@PlasmaHH No, this is an embedded system with limited resources, so the solution will be very similar no matter if C or C++. You can forget all about STL and such on these kind of systems. The embedded tag is needed though, I'll edit the post to address this. — Lundin, Jun 18 '14 at 11:53
@Lundin: So you are saying solution that involves template metaprogramming would be a good solution for C, and a hacky mess of macros would be a good solution for C++? — PlasmaHH, Jun 18 '14 at 11:54
@PlasmaHH No, I'm saying that this is a question related to low-level embedded programming, where concepts such as templates doesn't even make sense. If you have ever written such programs in either C or C++, you will know what I'm talking about. As for hacky messes of macros, they are equally possible and equally discouraged in both languages. — Lundin, Jun 18 '14 at 12:02
While RAM is indeed precious, are you certain that the savings you'll get from trying to make a solution are worth the time spend on that solution? How many callbacks (thus RAM) are we really talking here? Consider using a simple solution of registering at construction, and moving on to the rest of your application. If you are getting crunched for space revisit this... It seems like a bit of early optimization to me (granted, I have no idea what you're doing with the arduino, maybe you're already done with the rest...) — Ross, Jun 18 '14 at 13:28
Apologies for my late replies - I had expected stackoverflow to notify me of comments and answers but that didn't happen for some reason. — Matthijs Kooijman, Jun 25 '14 at 10:58
@WhozCraig, platform independence - would be nice, but I'm really only using gcc / avr-gcc right now, so gcc-specific stuff is ok for me. — Matthijs Kooijman, Jun 25 '14 at 11:01
@Sergey-l, generating stuff in the Makefile - I'm using the Arduino IDE, which doesn't allow modifying the build process much, so that won't really work. — Matthijs Kooijman, Jun 25 '14 at 11:02
@Lundin, even though I'm on an embedded environment, templates are available through gcc on avr as normal. STL isn't, though. — Matthijs Kooijman, Jun 25 '14 at 11:03
@Ross, you're right in that just taking a bit of extra RAM use might be the most efficient way to spend my time. However, I'm quite the perfectionist and I've found that this pattern re-occurs regularly in (Arduino) code, so I was hoping for some clean way to solve it once and for all. — Matthijs Kooijman, Jun 25 '14 at 11:06
Fun fact: LLVM IR actually [supports](https://llvm.org/docs/LangRef.html#linkage-appending) just this for implementing global constructors. — Trass3r, Apr 03 '20 at 12:33

Pedro · Answer 1 · 2016-06-09T19:02:00.507

As commented in some previous answer, the best option is to use a custom linker script (with a KEEP(*(SORT(.whatever.*))) input section).

Anyway, it can be done without modifying the linker scripts (working sample code below), at least at some platforms with gcc (tested on xtensa embedded device and cygwin)

Assumptions:

We want to avoid using RAM as much as possible (embedded)
We do not want the calling module to know anything about the modules with callbacks (it is a lib)
No fixed size for the list (unknown size at library compile time)
I am using GCC. The principle may work on other compilers, but I have not tested it
Callback funtions in this sample receive no arguments, but it is quite simple to modify if needed

How to do it:

We need the linker to somehow allocate at link time an array of pointers to functions
As we do not know the size of the array, we also need the linker to somehow mark the end of the array

This is quite specific, as the right way is using a custom linker script, but it happens to be feasible without doing so if we find a section in the standard linker script that is always "kept" and "sorted".

Normally, this is true for the .ctors.* input sections (the standard requires C++ constructors to be executed in order by function name, and it is implemented like this in standard ld scripts), so we can hack a little and give it a try.

Just take into account that it may not work for all platforms (I have tested it in xtensa embedded architecture and CygWIN, but this is a hacking trick, so...).

Also, as we are putting the pointers in the constructors section, we need to use one byte of RAM (for the whole program) to skip the callback code during C runtime init.

test.c:

A library that registers a module called test, and calls its callbacks at some point

#include "callback.h"

CALLBACK_LIST(test);

void do_something_and_call_the_callbacks(void) {

        // ... doing something here ...

        CALLBACKS(test);

        // ... doing something else ...
}

callme1.c:

Client code registering two callbacks for module test. The generated functions have no name (indeed they do have a name, but it is magically generated to be unique inside the compilation unit)

#include <stdio.h>
#include "callback.h"

CALLBACK(test) {
        printf("%s: %s\n", __FILE__, __FUNCTION__);
}

CALLBACK(test) {
        printf("%s: %s\n", __FILE__, __FUNCTION__);
}

void callme1(void) {} // stub to be called in the test sample to include the compilation unit. Not needed in real code...

callme2.c:

Client code registering another callback for module test...

#include <stdio.h>
#include "callback.h"

CALLBACK(test) {
        printf("%s: %s\n", __FILE__, __FUNCTION__);
}

void callme2(void) {} // stub to be called in the test sample to include the compilation unit. Not needed in real code...

callback.h:

And the magic...

#ifndef __CALLBACK_H__
#define __CALLBACK_H__

#ifdef __cplusplus
extern "C" {
#endif

typedef void (* callback)(void);
int __attribute__((weak)) _callback_ctor_stub = 0;

#ifdef __cplusplus
}
#endif

#define _PASTE(a, b)    a ## b
#define PASTE(a, b)     _PASTE(a, b)

#define CALLBACK(module) \
        static inline void PASTE(_ ## module ## _callback_, __LINE__)(void); \
        static void PASTE(_ ## module ## _callback_ctor_, __LINE__)(void); \
        static __attribute__((section(".ctors.callback." #module "$2"))) __attribute__((used)) const callback PASTE(__ ## module ## _callback_, __LINE__) = PASTE(_ ## module ## _callback_ctor_, __LINE__); \
        static void PASTE(_ ## module ## _callback_ctor_, __LINE__)(void) { \
                 if(_callback_ctor_stub) PASTE(_ ## module ## _callback_, __LINE__)(); \
        } \
        inline void PASTE(_ ## module ## _callback_, __LINE__)(void)

#define CALLBACK_LIST(module) \
        static __attribute__((section(".ctors.callback." #module "$1"))) const callback _ ## module ## _callbacks_start[0] = {}; \
        static __attribute__((section(".ctors.callback." #module "$3"))) const callback _ ## module ## _callbacks_end[0] = {}

#define CALLBACKS(module) do { \
        const callback *cb; \
        _callback_ctor_stub = 1; \
        for(cb =  _ ## module ## _callbacks_start ; cb <  _ ## module ## _callbacks_end ; cb++) (*cb)(); \
} while(0)

#endif

main.c:

If you want to give it a try... this the entry point for a standalone program (tested and working on gcc-cygwin)

void do_something_and_call_the_callbacks(void);

int main() {
    do_something_and_call_the_callbacks();
}

output:

This is the (relevant) output in my embedded device. The function names are generated at callback.h and can have duplicates, as the functions are static

app/callme1.c: _test_callback_8
app/callme1.c: _test_callback_4
app/callme2.c: _test_callback_4

And in CygWIN...

$ gcc -c -o callme1.o callme1.c
$ gcc -c -o callme2.o callme2.c
$ gcc -c -o test.o test.c
$ gcc -c -o main.o main.c
$ gcc -o testme test.o callme1.o callme2.o main.o
$ ./testme
callme1.c: _test_callback_4
callme1.c: _test_callback_8
callme2.c: _test_callback_4

linker map:

This is the relevant part of the map file generated by the linker

 *(SORT(.ctors.*))
 .ctors.callback.test$1    0x4024f040    0x0    .build/testme.a(test.o)
 .ctors.callback.test$2    0x4024f040    0x8    .build/testme.a(callme1.o)
 .ctors.callback.test$2    0x4024f048    0x4    .build/testme.a(callme2.o)
 .ctors.callback.test$3    0x4024f04c    0x0    .build/testme.a(test.o)

This is quite elegant, I wish I could star it. I like the fact that it only needs a sorted section in linker script (and no start/end markers), which is the same if you use one on multiple "callbacks". — domen, Aug 15 '16 at 10:52
Nice, this actually looks like a fairly elegant solution to my problem. I only saw this now (somehow stackoverflow isn't sending me e-mail notifications for new answers) while Googling for some other related issue and coming across my own question :-) — Matthijs Kooijman, Aug 30 '16 at 13:35
Coming back to this answer for another project, thanks again. I looked closer at the AVR linker scripts, which seem to not have the required sorted constructor wildcard section (only a single `.ctor` section sorted by compilation unit name: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/scripttempl/avr.sc;h=b85748b2f92c0444f0e98fcdf8b9a837227499f1;hb=HEAD#l149). Seems I won't get away without linker script changes, then... — Matthijs Kooijman, Aug 21 '20 at 12:03
I also had two questions about your solution: 1) Wouldn't inserting things into the `.ctors` section cause them to be ran at startup? 2) Why this complication with a two-layer callback function that only runs when `_callback_ctor_stub` is set? But then I realized that these answer each other: Anything in `.ctors` is ran at startup, so the `_callback_ctor_stub` makes sure that the actual callbacks are *not* ran at startup, only when they are actually needed. Clever fix, though it again adds a bit of memory overhead (which might be problematic in RAM, not so much in flash). — Matthijs Kooijman, Aug 21 '20 at 12:03

score 1 · Answer 2 · answered Jun 18 '14 at 11:52

Try to solve the actual problem. What you need are multiple callback functions, that are defined in various modules, that aren't in the slightest related to each other.

What you have done though, is to place a global variable in a header file, which is accessible by every module including that header. This introduces a tight coupling between all such files, even though they are not related to each other. Furthermore, it seems only the callback handler .c function needs to actually call the functions, yet they are exposed to the whole program.

So the actual problem here is the program design and nothing else.

And there is actually no apparent reason why you need to allocate this array at compile time. The only sane reason would be to save RAM, but that's of course is a valid reason for an embedded system. In which case the array should be declared as const and initialized at compile time.

You can keep something similar to your design if storing the array as read-write objects. Or if the array must be a read-only one for the purpose of saving RAM, you must do a drastic re-design.

I'll give both versions, consider which one is most suitable for your case:

RAM-based read/write array

(Advantage: flexible, can be changed in runtime. Disadvantages: RAM consumption. Slight over-head code for initialization. RAM is more exposed to bugs than flash.)

Let the callback.h and callback.c from a module which is only concerned with the handling of the callback functions. This module is responsible for how the callbacks are allocated and when they are executed.
In callback.h define a type for the callback functions. This should be a function pointer type just as you have done. But remove the variable declaration from the .h file.

In callback.c, declare the callback array of functions as

 static callback_t callbacks [LARGE_ENOUGH_FOR_WORST_CASE];

There is no way you can avoid "LARGE_ENOUGH_FOR_WORST_CASE". You are on an embedded system with limited RAM, so you have to actually consider what the worst-case scenario is and reserve enough memory for that, no more, no less. On a microcontroller embedded system, there are no such things as "usually needed" nor "lets save some RAM for other processes". Your MCU either has enough memory to cover the worst case scenario, or it does not, in which case no amount of clever allocations will save you.
In callback.c, declare a size variable that keeps track of how much of the callback array that has been initialized. static size_t callback_size;.
Write an init function void callback_init(void) which initializes the callback module. The prototype should be in the .h file and the caller is responsible for executing it once, at program startup.
Inside the init function, set callback_size to 0. The reason I propose to do this in runtime is because you have an embedded system where a .bss segment may not be present or even undesired. You might not even have a copy-down code that initializes all static variables to zero. Such behavior is non-conformant with the C standard but very common in embedded systems. Therefore, never write code which relies on static variables getting automatically initialized to zero.
Write a function void callback_add (callback_t* callback);. Every module that includes your callback module will call this function to add their specific callback functions to the list.
Keep your do_callbacks function as it is (though as a minor remark, consider renaming to callback_traverse, callback_run or similar).

Flash-based read-only array

(Advantages: saves RAM, true read-only memory safe from memory corruption bugs. Disadvantages: less flexible, depends on every module used in the project, possibly slightly slower access because it's in flash.)

In this case, you'll have to turn the whole program upside-down. By the nature of compile-time solutions, it will be a whole lot more "hard-coded".

Instead of having multiple unrelated modules including a callback handler module, you'll have to make the callback handler module include everything else. The individual modules still don't know when a callback will get executed or where it is allocated. They just declare one or several functions as callbacks. The callback module is then responsible for adding every such callback function to its array at compile-time.

// callback.c

#include "timer_module.h"
#include "spi_module.h"
...

static const callback_t CALLBACKS [] = 
{
  &timer_callback1,
  &timer_callback2,
  &spi_callback,
  ...
};

The advantage of this is that you'll automatically get the worst case scenario handed to you by your own program. The size of the array is now known at compile time, it is simply sizeof(CALLBACKS)/sizeof(callback_t).

Of course this isn't nearly as elegant as the generic callback module. You get a tight coupling from the callback module to every other module in the project, but not the other way around. Essentially, the callback.c is a "main()".

You can still use a function pointer typedef in callback.h though, but it is no longer actually needed: the individual modules must ensure that they have their callback functions written in the desired format anyhow, with or without such a type present.

This is a C solution but a C++ one will follow the very same principles. Just wrap everything up in classes and use more elegant private encapsulation, otherwise it will be the same thing. Though be aware that the concerns for static initialization even more so apply to C++ class objects with static storage duration: the constructors might not get executed, or you might not want to have them executed at start up. — Lundin, Jun 18 '14 at 11:56
Thanks for your thorough answer! However, neither solution really solves my problem. My problem mostly occurs while writing library code, which should be generic and decoupled. The second solution obviously breaks this decoupling. However, the first solution also introduces a coupling between the declaration of the array and the (number of) modules supplying callbacks. It's only a small coupling, but it still requires manually updating the array size when adding or removing a module (which I'd like to avoid) or really over-sizing the array (which is also not what I want). — Matthijs Kooijman, Jun 25 '14 at 11:16
@MatthijsKooijman As I wrote, there is no way you can avoid `array[LARGE_ENOUGH_FOR_WORST_CASE]`. There is no such thing as over-sizing: the array is either large enough or it is too small. If you want that constant to be application-specific, you could declare it as `extern const` in your .h file and thereby demand that the application implements it. — Lundin, Jun 25 '14 at 11:24
The constant could also depend on callbacks in other libraries which might or might not be included, so I really want to avoid having to manually count the number of callbacks... — Matthijs Kooijman, Jun 28 '14 at 09:40

score 0 · Answer 3 · answered Sep 08 '15 at 23:51

I too am faced with a similar problem:

...need are multiple callback functions, that are defined in various modules, that aren't in the slightest related to each other.

Mine is C, on Atmel XMega processor. You mentioned that you are using GCC. The following doesn't solve your problem, it is a variant on the above #1 solution. It exploits the __attribute__((weak)) directive.

1) For each optional module, have a unique (per module name) but similar (per purpose) callback function. E.g.

fooModule.c:
void foo_eventCallback(void) {
    // do the foo response here
}

barModule.c:
void bar_eventCallback(void) {
    // do the bar response here
}

yakModule.c:
void yak_eventCallback(void) {
    // do the yak response here
}

2) Have a callback start point that looks something like:

__attribute__((weak)) void foo_eventCallback(void) { }
__attribute__((weak)) void bar_eventCallback(void) { }
__attribute__((weak)) void yak_eventCallback(void) { }

void functionThatExcitesCallback(void) {
    foo_eventCallback();
    foo_eventCallback();
    foo_eventCallback();
}

The __attribute__((weak)) qualifier basically creates a default implementation with an empty body, which the linker will replace with a different variant IF it finds a non-weak variant by the same name. It doesn't make it completely decoupled, unfortunately. But you can at least put this big super-set-of-all-callbacks in one and only one place, and not get into header file hell with it. And then your different compilation units basically replace the subsets of the superset that they want to. I would love it if there was a way to do this with using the same named function in all modules and just have those called based on what's linked, but haven't yet found something that does that.

Merging global arrays at link time / filling a global array from multiple compilation units

3 Answers3

Linked