4

I'm developing an embedded application for Cortex M3 with GCC 4.8 from GNU ARM toolchain in C++. The application uses some singletons that are instantiated via function local static variable, like that (real code):

GlobalDataTypeRegistry& GlobalDataTypeRegistry::instance()
{
    static GlobalDataTypeRegistry inst;
    return inst;
}

Which is a classic approach for implementing singletons in C++. The problem is that the output code size explodes once I use such instantiation, which obviously means that the compiler/linker adds some service code for proper initialization/destruction of the singleton object.

Here's the minimal example that allows to reproduce the issue:

This will compile into 66k of code (-Os):

struct A
{
    A()  { __asm volatile ("nop"); }
    ~A() { __asm volatile ("nop"); }
};

A& getA()
{
    static A a;
    return a;
}

int main()
{
    (void)getA();
    return 0;
}

This will compile into 9k of code (-Os):

struct A
{
    A()  { __asm volatile ("nop"); }
    ~A() { __asm volatile ("nop"); }
};

static A a;  // Extracted from the function scope
A& getA()
{
    return a;
}

int main()
{
    (void)getA();
    return 0;
}

If the line (void)getA(); is commented out completely, the final binary size will be just about 4k.

The question is: what options do I have to avoid extra 62k of code for this singleton, aside from extracting the static variable out of the function scope? Is there any options to tell GCC that it is not necessary to call the destructor of the singleton upon application exit (since it does never exit anyway)? Any other ways to optimize?

Pavel Kirienko
  • 1,162
  • 1
  • 15
  • 31
  • 1
    You could use `-S` to look at the assembler, and what is different between the two versions. (One thing is different: in the first version, the compiler must protect against multiple calls from different threads, to still ensure that the object is only initialized once. I can't imagine that taking so much space, however.) – James Kanze Apr 10 '14 at 10:45
  • How does GCC ensure thread safety on an embedded system, where the thread safety primitives (Mutexes) are not available for the compiler? – Pavel Kirienko Apr 10 '14 at 10:55
  • I don't know. Maybe it doesn't support multiple threads on such a system. Or maybe it implements some sort of mechanism itself (which could account for the increased size). – James Kanze Apr 10 '14 at 11:23

2 Answers2

2

You could create your singleton with placement new inside a buffer implemented with std::aligned_storage.

Jan Herrmann
  • 2,717
  • 17
  • 21
2

Add -fno-threadsafe-statics option to g++ command, and your code size will be reduced.

Here is my example code:

class X {
private: 
    X() { };

public:
    ~X() { };

    static X* get_instance() {
        static X instance;
        return &instance;
    }

    void show() {
        asm("");
    }
};


int main() {
    X* temp = X::get_instance();
    temp->show();

    while (true) {
        asm("");
    }
}

References:

so61pi
  • 780
  • 2
  • 7
  • 21