4

This question builds off these two stackoverflow posts:

Here's the question: Why doesn't the multiple definition error appear for classes/structs/enums? Why does it only apply to functions or variables?

I wrote some example code in an effort to capture my confusion. There are 4 files: namespace.h, test.h, test.cpp, and main.cpp. The first file is included in both test.cpp and main.cpp, which leads to the multiple definition error if the correct lines are uncommented.

// namespace.h
#ifndef NAMESPACE_H
#define NAMESPACE_H

namespace NamespaceTest {
    // 1. Function in namespace: must be declaration, not defintion
    int test(); // GOOD

    // int test() { // BAD
    //    return 5;
    //}

    // 2. Classes can live in header file with full implementation
    // But if the function is defined outside of the struct, it causes compiler error
    struct TestStruct {
        int x;
        int test() { return 10; } // GOOD
    };

    //int TestStruct::test() { // BAD
    //    return 10;
    //}

    // 3. Variables are also not spared from the multiple definition error.
    //int x = 20; // BAD

    // 4. But enums are perfectly safe.
    enum TestEnum { ONE, TWO }; // GOOD
}

#endif
// test.h
#ifndef TEST_H
#define TEST_H

class Test {
public:
    int test();
};
#endif
// test.cpp
#include "test.h"
#include "namespace.h"

int NamespaceTest::test() {
    return 5;
}

int Test::test() {
    return NamespaceTest::test() + 1;
}
// main.cpp
#include <iostream>
#include "namespace.h"
#include "test.h"

int main() {
    std::cout << "NamespaceTest::test: " << NamespaceTest::test() << std::endl;

    Test test;
    std::cout << "Test::test: " <<test.test() << std::endl;

    NamespaceTest::TestStruct test2;
    std::cout << "NamespaceTest::TestStruct::test: " << test2.test() << std::endl;

    std::cout << "NamespaceTest::x: " << NamespaceTest::TestEnum::ONE << std::endl;
}

g++ test.cpp main.cpp -o main.out && ./main.out

NamespaceTest::test: 5
Test::test: 6
NamespaceTest::TestStruct::test: 10
NamespaceTest::x: 0
Daniel Handojo
  • 612
  • 5
  • 19
  • 3
    This doesn't really have anything to do with namespaces. Remove them and you still have the same issue. – Retired Ninja Jun 29 '21 at 23:40
  • The language designers wanted it that way . And it seems like a reasonable design -- it'd be hard to program if you couldn't use a class in a header included in two different units, right? – M.M Jun 29 '21 at 23:41
  • @RetiredNinja agreed, the issue exists separate of namespaces. – Daniel Handojo Jun 29 '21 at 23:47
  • @M.M I'm looking for something more specific. Yes, it would be inconvenient if classes couldn't be defined in header files, but couldn't the same logic apply to variables or functions? What's the logical limitation preventing those from having duplicates? – Daniel Handojo Jun 29 '21 at 23:48
  • If there were duplicate function definitions how would the linker decide which one to call when you call the function? I guess you could say this topic is related to the separate compiler/linker model, classes do not have linkage whereas functions do – M.M Jun 29 '21 at 23:50
  • 1
    See my partial answer below. Inline is a curious specifier that allows for duplicate definitions of functions as long as they're identical. Since the duplicate definitions error comes from using the same header file in two "translation units" (not sure if I'm using that term right), we know they're identical and clearly this is somehow known by the linker as well. Why can't the linker tell in all cases? – Daniel Handojo Jun 30 '21 at 00:03
  • Basically, it's because functions generate code and classes do not. – Pete Becker Jun 30 '21 at 12:49
  • The reason for this behavior is 50 years ago hardware was a lot smaller and slower than it is today. Instead of having one giant source code file, source code was split up into smaller files that could be incrementally built and linked much faster. C++ inherited this design from C. There's hope! C++20 **modules** should help allow C++ be more like other modern languages that don't have this kind of constraint. – Eljay Jun 30 '21 at 13:11
  • classes don't generate code, that's interesting. Can you elaborate on that @PeteBecker? – Daniel Handojo Jun 30 '21 at 19:20
  • @Eljay I just learned that C++17 has inline variables. I'm still living in C++11 land, barely. So wow the future is crazy. – Daniel Handojo Jun 30 '21 at 19:30
  • @PeteBecker, I think I kinda get what you mean now. A class definition is used by the compiler to understand stuff like how much memory to allocate for local variables. In other words, that gets captured in the code section of an object file. The functions take up memory, but we've already confirmed they're inline. Static member variables fall into that bucket too (at least in C++17, I guess). I still wonder how the compiler guarantees the multiple class definitions are identical, but the claim that classes don't take space makes sense now. – Daniel Handojo Jun 30 '21 at 21:37

2 Answers2

2

After reading cppreference: inline specifier, I have a partial answer. The rules for inline stipulate that functions defined within classes are considered inline. And inline functions are permitted to have duplicate definitions provided (1) they live in separate translation units and (2) are identical. I'm paraphrasing, but that's the gist.

That explains why the functions are legal, but not why multiple definitions of the class or enum are ok. Likely a similar explanation I imagine, but it would be good to know for sure.

Daniel Handojo
  • 612
  • 5
  • 19
1

Generally when you compile a definition that is namespace scoped (like functions or global variables), your compiler will emit a global symbol for it. If this appears in multiple translation units, there will be a conflict during link-time since there are multiple definitions (which happen to be equivalent, but the linker can't check this).

This is part of the one definition rule: Exactly one definition of a function or variable is allowed in the entire program, in one of the translation units.

There are some exceptions to this, for example, class definitions and inline functions/variables. However, definitions must be the exact same (textually) in all the translation units they appear in. Class definitions are meant to be #included, so it makes sense to allow them to appear in multiple translation units.

If you define a member function inside the class body they are implicitly inline because otherwise you would not be able to include the class definition with the member function definition without breaking ODR. For example, these three are functionally equivalent:

struct TestStruct {
    int x;
    int test() { return 10; }
};

// Could have been written

struct TestStruct {
    int x;
    inline int test() { return 10; }
};

// Or as

struct TestStruct {
    int x;
    int test();  // The `inline` specifier could also be here
};

inline int TestStruct::test() { return 10; }

You can do this to your namespace scoped functions/variables too: inline int test() { return 5; } and inline int x = 20; would have compiled with no further issue.

This is implemented by the compiler emitting "specially marked" symbols for inline entities, and the linker picking one arbitrarily since they should all be the same.

The same exception to ODR also exists for templated functions / variables and enum declarations, since they are also meant to live in header files.

Artyer
  • 31,034
  • 3
  • 47
  • 75
  • "This is implemented by the compiler emitting "specially marked" symbols for inline entities, and the linker picking one arbitrarily since they should all be the same." -- This implies there isn't a check on if the definitions are identical? By the way, I'm fine with marking this answer as correct, but do you have a reference discussing how textually identical definitions are identified as such? Also, @PeteBecker mentioned that classes don't generate code. What do you think he meant by this? We don't have "inline" classes, so what rule allows classes to be defined multiple times? – Daniel Handojo Jun 30 '21 at 19:29
  • @DanielHandojo Yes they aren't checked if they are equivalent before the linker decides on one to use. And from the standard: "each definition of D [the entity appearing in multiple translation units] shall consist of the same sequence of tokens" (so they have the same source code / text). – Artyer Jul 01 '21 at 10:26
  • Classes on their own don't generate symbols or have any machine code associated with them (other than member functions / static initialisers, which don't have to be inline, but the class itself is "inline" in a sense). Class definitions are explicitly required (and so allowed) to have multiple definitions in different translation units: "If a class is required in a translation unit if the class is used in a way that requires the class type to be complete" – Artyer Jul 01 '21 at 10:28