17

What are the best ways to declare and define global constants in C++? I am mostly interested in C++11 standard as it fixes a lot in this regard.

[EDIT (clarification)]: in this question "global constant" denotes constant variable or function that is known at compile time in any scope. Global constant must be accessible from more than one translation unit. It is not necessarily constexpr-style constant - can be something like const std::map<int, std::string> m = { { 1, "U" }, { 5, "V" } }; or const std::map<int, std::string> * mAddr() { return & m; }. I do not touch preferable good-style scope or name for constant in this question. Let us leave these matters for another question. [END_EDIT]

I want to know answers for all the different cases, so let us assume that T is one of the following:

typedef    int                     T;  // 1
typedef    long double             T;  // 2
typedef    std::array<char, 1>     T;  // 3
typedef    std::array<long, 1000>  T;  // 4
typedef    std::string             T;  // 5
typedef    QString                 T;  // 6
class      T {
   // unspecified amount of code
};                                     // 7
// Something special
// not mentioned above?                // 8

I believe that there is no big semantic (I do not discuss good naming or scope style here) difference between the 3 possible scopes:

// header.hpp
extern const T tv;
T tf();                  // Global
namespace Nm {
    extern const T tv;
    T tf();              // Namespace
}
struct Cl {
    static const T tv;
    static T tf();       // Class
};

But if choosing better way from alternatives below depends on the difference between above declaration scopes, please point it out.

Consider also the case when function call is used in constant definition, e.g. <some value>==f();. How would calling a function in constant initialization influence choosing between alternatives?

  1. Let us consider T with constexpr constructor first. Obvious alternatives are:

    // header.hpp
    namespace Ns {
    constexpr T A = <some value>;
    constexpr T B() { return <some value>; }
    inline const T & C() { static constexpr T t = <some value>; return t; }
    const T & D();
    }
    
    // source.cpp
    const T & Ns::D() { static constexpr T t = <some value>; return t; }
    

    I believe that A and B are most suitable for small T (such that having multiple instances or copying it at runtime is not a problem), e.g. 1-3, sometimes 7. C and D are better if T is large, e.g. 4, sometimes 7.

  2. T without constexpr constructor. Alternatives:

    // header.hpp
    namespace Ns {
    extern const T a;
    inline T b() { return <some value>; }
    inline const T & c() { static const T t = <some value>; return t; }
    const T & d();
    }
    
    // source.cpp
    extern const T Ns::a = <some value>;
    const T & Ns::d() { static const T t = <some value>; return t; }
    

    I would not normally use a because of static initialization order fiasco. As far as I know, b, c and d are perfectly safe, even thread-safe since C++11. b does not seem to be a good choice unless T has a very cheap constructor, which is uncommon for non-constexpr constructors. I can name one advantage of c over d - no function call (run-time performance); one advantage of d over c - less recompiling when constant's value is changed (these advantages also apply to C and D). I am sure that I missed a lot of reasoning here. Provide other considerations in answers please.

If you want to modify / test the above code, you can use my test files (just header.hpp, source.cpp with compilable versions of above code fragments and main.cpp that prints constants from header.hpp): https://docs.google.com/uc?export=download&id=0B0F-aqLyFk_PVUtSRnZWWnd4Tjg

vedg
  • 785
  • 2
  • 7
  • 18
  • 4
    We have terminology for discussing these things: "scope", "lifetime", "storage class", "linkage". I suggest you familiarize yourself with their meanings, you will probably find out what you wanted to know from their definitions, and if not, you can edit your question with the right terminology so that it becomes clear. – Ben Voigt May 14 '14 at 15:22
  • I think that “scope”, “lifetime” and “storage class” are only slightly related to the question. I have read about “linkage”, learned something new, but still do not see anything wrong with suggested alternatives. I agree that my question is not very clear, though not because of unused terminology but because I did not explain mentioned advantages well enough. I just have a feeling that my question is already long enough. Seems that I would have to make it even more verbose if no one provides a good answer in the nearest future. – vedg May 14 '14 at 19:20
  • 1
    keep in mind that `inline` will not guarantee the function is actually `inline`d by the compiler. – YoungJohn May 15 '14 at 21:19
  • Doesn't `extern const` disobey your requirement of "known at compile time"? How can a `constexpr` constructor possibly be "expensive"? – aschepler May 15 '14 at 22:16
  • @YoungJohn, yes, not always. But I trust modern compiler to do the right thing in this case. So if function would not be inlined, then it is probably better that way. – vedg May 16 '14 at 13:37
  • @aschepler, I would say that `extern const std::string str = "my const";` is *known* at compile time. But it can not be constructed at compile time. Of course it is possible that `extern const` is actually not a constant, e.g. `extern const std::time_t t = std::time(nullptr);`. In this case `t` would be different between launches. But I do not consider `t` a global constant. So, I consider only subset of all possible `extern const` variables in my question. – vedg May 16 '14 at 13:44
  • As for `expensive constexpr constructor`: thanks, I have rephrased this part of the question and also added namespace around constants so as not to proliferate global-variables-style and silence outrage over such a bad style. – vedg May 16 '14 at 13:46

3 Answers3

7

I believe that there is no big difference between the following declaration locations:

This is wrong in a lot of ways.

The first declaration pollutes the global namespace; you have taken the name "tv" from ever being used again without the possibility of misunderstandings. This can cause shadowing warnings, it can cause linker errors, it can cause all sorts of confusion to anyone who uses your header. It can also cause problems to someone who doesn't use your header, by causing a collision with someone else who also happens to use your variable name as a global.

Such an approach is not recommended in modern C++, but is ubiquitous in C, and therefore leads to much use of the static keyword for "global" variables in a .c file (file scope).

The second declares pollutes a namespace; this is much less of an issue, as namespaces are freely renamable and can be made at no cost. As long as two projects use their own, relatively specific namespace, no collisions will occur. In the case where such collisions do occur, the namespaces for each can be renamed to avoid any issues.

This is more modern, C++03 style, and C++11 expands this tactic considerably with renaming of templates.

The third approach is a struct, not a class; they have differences, especially if you want to maintain compatibility with C. The benefits of a class scope compound on the namespace scope; not only can you easily encapsulate multiple things and use a specific name, you can also increase encapsulation via methods and information hiding, greatly expanding how useful your code is. This is mostly the benefit of classes, irrespective of scoping benefits.

You should almost certainly not use the first one, unless your functions and variables are very broad and STL/STD like, or your program is very small and not likely to be embedded or reused.

Let's now look at your cases.

  1. The size of the constructor, if it returns a constant expression, is unimportant; all of the code ought to be executable at compile time. This means the complexity is not meaningful; it will always compile to a single, constant, return value. You should almost certainly never use C or D; all that does is make the optimizations of constexpr not work. I would use whichever of A and B looks more elegant, probably a simple assignment would be A, and a complex constant expression would be B.

  2. None of these are necessarily thread safe; the content of the constructor would determine both thread and exception safety, and it is quite easy to make any of these statements not thread safe. In fact, A is most likely to be thread safe; as long as the object is not accessed until main is called, it should be fully formed; the same cannot be said of any of your other examples. As for your analysis of B, in my experience, most constructors (especially exception safe ones) are cheap as they avoid allocation. In such cases, there's unlikely to be much difference between any of your cases.

I would highly recommend you stop attempting micro-optimizations like this and perhaps get a more solid understanding of C++ idioms. Most of the things you are trying to do here are unlikely to result in any increase in performance.

Alice
  • 3,958
  • 2
  • 24
  • 28
  • About declaration locations: yes, I knew these scope differences but asked if they influence guidelines concerning using constants. – vedg May 14 '14 at 18:12
  • 1. constexpr object is not always not-a-variable, it can refer to the location in memory (see http://stackoverflow.com/questions/13865842/does-static-constexpr-variable-make-sense). One can take address of constexpr object. Considering this, there is a difference between `A` and `B`: `A` has internal linkage, so it can be constructed in each translation unit that includes header.hpp. This is not good for big objects like `4`. – vedg May 14 '14 at 18:32
  • I am afraid that in case of `B` there are also issues with big objects: what if you need to pass `const T &` to some function? `constexpr T t = B(); f(t);` - would `T` be copy-constructed at run-time here? That is why I have suggested `C` and `D` for expensive-to-construct objects. – vedg May 14 '14 at 18:32
  • 2
    2. `b`, `c` and `d` are thread-safe since C++11 for reasonably written `T` constructors that do not call one of these functions (`b`, `c`, `d`). This is not difficult to guarantee. Other than this I do not see anything unsafe about these alternatives to `a`. Static local variables are well described here: http://en.cppreference.com/w/cpp/language/storage_duration (note the "since C++11" section). – vedg May 14 '14 at 18:45
  • This question is about safe constants in the first place. After reading your answer, it seems that `a` is the way to go, but it is often unsafe and dangerous (http://www.parashift.com/c++-faq/static-init-order.html). After safety is taken care of, it is good to weigh compile- and run-time performance. Since global constants are commonly used, it is reasonable to form certain guidelines - where to use each form. I could not find satisfying guidelines and asked this question. So my question is exactly about C++ idioms. – vedg May 14 '14 at 19:01
  • 2. b, c, and d are NOT thread safe; there is absolutely nothing that prohibits you from writing a non-thread-safe constructor using that idiom. It is, in fact, very easy to write a non-thread-safe constructor using C and D; this is one of the issues with lazy singletons using these idioms. You say "reasonable well written" as if it means anything; either a technique is thread safe, or it is not, or it makes no guarantee. In these cases, there is no guarantee. – Alice May 14 '14 at 21:56
  • 1
    1. You are misunderstanding the point of being able to take the address of a constant expression object; it does not necessarily refer to a location of memory (the compiler is free to optimize away this indirection and can do so in many circumstances). If a constant expression object is not optimized away at compile time, you are probably doing something wrong. A constant expression object should not matter how expensive it is, because it should be optimized away. – Alice May 14 '14 at 22:00
  • 3. A is not unsafe or dangerous; it is only dangerous if you need some specified ordering, in which case you should use a wrapper global object to enforce that order. This is not "unsafe", any more than malloc/free are unsafe; if you use something wrong or depend on constraints which do not exist, you are at fault, not the language. – Alice May 14 '14 at 22:02
  • Finally, I take issue with "global constants are commonly used"; this is not particularly true, and often indicative of bad design. Generally I've seen people use singleton's rather than global variables (though singletons can also introduce their own issues). Better design is to avoid global state as much as possible, no matter how well hidden. – Alice May 14 '14 at 22:04
  • I believe that constructor is either trivial or well enough written to be thread-safe in this case for `1-6`. And it is not difficult to write conforming constructor for custom class. Because, unless that constructor does something peculiar like accessing global variables, static constant in a function would be safe according to C++11 standard. 1. I still doubt very much that object like `4` would be always optimized away but I have no evidence for this; on the other hand, you also did not provide any evidence or link to support your stand. – vedg May 15 '14 at 07:58
  • I prefer not to think about possible static initialization order issues every time I change my code. Access functions provide easy and cheap (either in compilation time or in run-time performance, not both) solution. Functions also avoid other issues - slow application start-up, initialization of unused variables. – vedg May 15 '14 at 08:00
  • By “global constants” I mean constant (like in `A` and `a`) or one of the suggested alternative functions. Also, as indicated in my question, I do not see relevant-to-the-question difference between global namespace, named namespace and class scope. Even though all my alternatives are plain global variables / functions, they can (and usually should) be moved to namespace or class scope. Do you believe that global constant in any form is a bad design? What about `std::numeric_limits`, `pi` (user-defined function template)? – vedg May 15 '14 at 08:00
  • I don't understand your question; numeric_limits is not global. It's a templated class, in the standard namespace. No part of it is global; it's both confined into a namespace, as well as inside a class, as well as in instantiate scope; that's three levels of indirection from anything resembling global. Also, much of it is not variables, but methods, another layer of indirection. Furthermore, yes, any global state makes programs harder to reason about. Much like use of goto, global state is something to be minimized. – Alice May 15 '14 at 10:18
  • If you do not see the relevance, then I don't know how to explain it to you; in C++, we attempt to encapsulate and remove exposure of internal details as much as is possible. Your question was about C++ idioms; if you see no difference between the statements, then you don't understand modern C++. – Alice May 15 '14 at 10:19
  • In this question under "global constant" I understand constant variable or function that is known at compile time in *any* scope. My alternatives are just examples, not good scope guidelines. What you write about good style with scopes is certainly correct, but I was trying to generalize constant-related guidelines regardless of scope. So I suggest you do not mention scope again unless it would influence which alternative (variable or function) you choose. I also ask you not to challenge my short variable / function names - I do not assert that such short names represent a good naming style :) – vedg May 15 '14 at 10:30
  • 5
    Good answer, except for one mistake: class = struct in C++ (except for default member and base visibility). You can even forward-declare a `struct` and later define it as `class` (and vice-versa). – Konrad Rudolph May 15 '14 at 14:36
  • @KonradRudolph Right, which are the differences between struct and class that I specifically target as important. – Alice Nov 08 '14 at 15:07
2

You didn't mention an important option:

namespace
{
    const T t = .....;
};

Now there are no name collision issues.

This isn't appropriate if T is something you only want to construct once. But having a large "global" object, const or not, is something you really want to avoid. It breaks encapsulation, and also introduces the static initialization order fiasco into your code.

I've never had the need for a large extern const object. If I need a large hardcoded lookup table for example, then I write a function (perhaps as a class member) that looks up the table; and the table is local to the unit with the implementation of that function.

In my code that seems to call for a large non-const global object, I actually have a function,

namespace MyStuff
{
     T &get_global_T();
}

which constructs the object on first use. (Actually, the object itself is hidden in one unit, and T is a helper class that specifies an interface; so I can mess around with the object's details and not disturb any code that is using it).

M.M
  • 138,810
  • 21
  • 208
  • 365
  • Yes, unnamed namespace in header is an interesting idea. It is almost the same as (but preferable to) `static const T t = ...;` in namespace scope, and enforces internal linkage. I definitely prefer external linkage in case of constexpr objects. Not so sure about simple const... I suppose this way can compete with `b` if safety requirements for `c` and `d` can not be satisfied but `T` is large and used often. – vedg May 16 '14 at 14:02
  • Otherwise I do not see much use for this alternative because internal linkage would cause having as many copies of the constant as the number of translation units that include header.hpp. "the table is local to the unit with the implementation of that function" - it won't save you from static initialization order fiasco if that table is declared as `const Table table=...;` and that function happens to be called from global object constructor or destructor. – vedg May 16 '14 at 14:06
  • "as many copies of the constant " - it could be optimized out, and even if it isn't - so what? – M.M May 17 '14 at 04:11
  • If it isn't, it would waste memory, but that is not my primary concern. I am afraid that many copies can also cause performance issues: there would be several identical constants and they would be replaced by each other in processor cache. – vedg May 17 '14 at 06:49
  • Also variables in unnamed namespace can potentially violate ODR if used in inline functions. See [this paper](http://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4147.pdf) for example. – vedg Apr 29 '15 at 19:31
1

1

In case A there is a difference between global or namespace scope (internal linkage) and class scope (external linkage). So

// header.hpp
constexpr T A = <some value>; // internal linkage
namespace Nm { constexpr T A = <some value>; } // internal linkage
class Cl { public: static constexpr T A = <some value>; }; // not enough!

Consider the following usage:

// user.cpp
std::cout << A << Nm::A << Cl::A; // ok
std::cout << &A << &Nm::A;        // ok
std::cout << &Cl::A;              // linker error: undefined reference to `Cl::A'

Placing Cl::A definition in source.cpp (in addition to the above Cl::A declaration) eliminates this error:

// source.cpp
constexpr T Cl::A;

External linkage means that there would always be only one instance of Cl::A. So Cl::A seems to be a very good candidate for large T. However: can we be sure that static initialization order fiasco would not present itself in this case? I believe that the answer is yes, because Cl::A is constructed at compile-time.

I have tested A, B, a alternatives with g++ 4.8.2 and 4.9.0, clang++ 3.4 on GNU/Linux platform. The results for three translation units:

  • A in class scope with definition in source.cpp was both immune to fiasco and had the same address in all translation units even at compile-time.
  • A in namespace or global scope had 3 different addresses both for large array and constexpr const char * A = "A"; (because of internal linkage).
  • B (std::array<long double, 100>) in any scope had 2 different addresses (address was the same in 2 translation units); additionally all 3 B addresses suggested some different memory location (they were much bigger than other addresses) - I suspect that array was copied in memory at runtime.
  • a when used with constexpr types T, e.g. int, const char *, std::array, AND initialized with constexpr expression in source.cpp, was as good as A: immune to fiasco and had the same address in all translation units. If constant of constexpr type T is initialized with non-constexpr, e.g. std::time(nullptr), and used before initialization, it would contain default value (for example, 0 for int). It means that constant's value can depend on static initialization order in this case. So, do not initialize a with non-constexpr value!

The bottom line

  1. prefer A in class scope for any constexpr constant in most cases because it combines perfect safety, simplicity, memory saving and performance.
  2. a (initialized with constexpr value in source.cpp!) should be used if namespace scope is preferable or it is desirable to avoid initialization in header.hpp (in order to reduce dependencies and compilation time). a has one disadvantage compared to A: it can be used in compile-time expressions only in source.cpp and only after initialization.
  3. B should be used for small T in some cases: when namespace scope is preferable or template compile-time constant is needed (pi for example). Also B can be used when constant's value is rarely used or used only in exceptional situations, e.g. error messages.
  4. Other alternatives should almost never be used as they would rarely suit better than all 3 before-mentioned ways.
    • A in namespace scope should not be used because it can potentially lead to N instances of constant, hence consume sizeof(T) * N bytes of memory and cause cache misses. Here N equals to the number of translation units that include header.hpp. As noted in this proposal, A in namespace scope can violate ODR if used in inline function.
    • C could be used for big T (B is usually better for small T) in 2 rare scenarios: when function call is preferable; when namespace scope AND initializing in header is preferable.
    • D could be used when function call AND initializing in source file is preferable.
    • The only shortcoming of C compared to A and B - its return value can not be used in compile-time expression. D suffers from the same shortcoming and another one: function call run-time performance penalty (because it can not be inlined).

2

Avoid using non-constexpr a because of static initialization order fiasco. Consider a only in case of sure bottleneck. Otherwise, safety is more important than small performance gain. b, c and d are much safer. However c and d have 2 safety requirements:

for (auto f : {all c and d-like functions}) {

  • T constructor must not call f because if the initialization of static local variable recursively enters the block in which the variable is being initialized, the behavior is undefined. This is not difficult to ensure.
  • For each class X such that X::~X calls f and there is a statically initialized X object: X::X must call f. The reason is that otherwise static const T from f could be constructed after and therefore destructed before global X object; then X::~X would cause UB. This requirement is much more difficult to guarantee than the previous one. So it almost prohibits global or static local variables with complicated destructors that use global constants. If destructor of statically initialized variable is not complicated, e.g. uses f() for logging purposes, then placing f(); in the corresponding constructor ensures safety.

}

Note: these 2 requirements do not apply to C and D:

  • the recursive call to f would not compile;
  • static constexpr T constants in C and D are constructed at compile time - before any non-trivial variable is constructed, so they are destructed after all non-trivial variables' destruction (destructors are called in reverse order).

Note 2: C++ FAQ suggests a different implementation of c and d, which does not impose the second safety requirement. However in this case static constant is never destructed, which can interfere with memory leak detection, e.g. Valgrind diagnostic. Memory leaks, however benign, should be avoided. So these modified versions of c and d should be used only in exceptional situations.

One more alternative to consider here is a constant with internal linkage:

// header.hpp
namespace Ns { namespace { const T a1 = <some value>; } }

This approach has the same big downside as A in namespace scope: internal linkage can create as many copies of a1 as the number of translation units that include header.hpp. It can also violate ODR in the same way as A. However, since other options for non-constexpr are not as good as for constexpr constants, this alternative actually could have some rare use. BUT: this "solution" is still prone to static initialization order fiasco in case when a1 is used in public function which in turn is used for initialization of a global object. So introducing internal linkage does not solve the problem - just hides it, makes it less likely, probably more difficult to locate and fix.

The bottom line

  • c provides the best performance and saves memory because it facilitates reusing exactly one T instance and can be inlined, so it should be used in most cases.
  • d is as good as c for saving memory but is worse for performance as it would never be inlined. However d can be used to reduce compilation time.
  • consider b for small types or for rarely used constants (in rarely-used-constant case its definition can be moved to source.cpp to avoid recompilation on change). Also b is the only solution if safety requirements for c and d can not be satisfied. b is definitely not good for large T if constant is used often, because the constant has to be constructed each time b is called.

Note: there is another compile-time issue of inline functions and variables initialized in header.hpp. If constant's definition depends on another constant declared in a different header bad.h, and header bad.h should not be included in header.hpp, then D, d, a and modified b (with definition moved to source.cpp) are the only alternatives.

vedg
  • 785
  • 2
  • 7
  • 18
  • Another significant difference for `c`/`d` vs other non-`constexpr` versions is that the initialization is performed lazily. Only on the first call of `c` or `d` will the value to initialize `t` with be found. This is very significant if `` is not a compile time constant but rather something that requires run time computation or which has side effects, or the same apply to `T`'s constructor. It's easy to imagine a library where users may not ever call every `f` provided. The one time lazy initialization is also thread safe, which is not trivial to code from scratch. – TrentP Jul 18 '17 at 22:39
  • 1. _"or which has side effects"_ - I'd **much** rather avoid side effects in a "global constant". 2. Yes, the situation when `f` is not called is ideal performance-wise. – vedg Jul 19 '17 at 18:04
  • 3. I've grown to dislike static variables in functions since I wrote this answer. Thread safety comes with a performance cost at each function call (a mutex lock I think). Nowadays I'd prefer passing context data constructed during the app/library initialization into constructors/functions instead of `static`-storing it in functions. This may also make the program logic easier to understand and modify. For example, it is possible to read constants from a configuration file into the context object during initialization instead of hardcoding the constants in the program code. – vedg Jul 19 '17 at 18:06
  • 4. If I understand correctly the C++17's inline variables are very similar to inline functions and may incur the same performance costs as static variables in functions. – vedg Jul 19 '17 at 18:09