16

If I include <string> or <vector> in multiple translation units (different .cpp files), why doesn't it break the ODR?

As far as I know, each .cpp is compiled differently, so std::vector's member functions will be generated for each object file separately, right?

The linker should detect it and raise an error. Even if it doesn't (I suspect it's special case for templates), will it be reusing the same machine code, or a different set of cloned code in each translation unit, when I link all together?

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
barney
  • 2,172
  • 1
  • 16
  • 25
  • 3
    Essentially, the compiler and linker conspire to make it work, using the same mechanism that inline functions use. – Igor Tandetnik Dec 31 '15 at 23:26
  • 1
    As you suspect it is a special case for templates, like for inline functions. Definition in different files should be exactly right to not violate ODR. – Revolver_Ocelot Dec 31 '15 at 23:28
  • 3
    barney: Why don't you try picking out a specific phrase in the ODR that you think is violated, and why the stated exceptions don't apply? – Chris Beck Dec 31 '15 at 23:28

4 Answers4

22

The same way any template definitions don't break the ODR — the ODR specifically says that template definitions may be duplicated across translation units, as long as they are literally duplicates (and, since they are duplicates, no conflict or ambiguity is possible).

There can be more than one definition of a class type (Clause [class]), enumeration type ([dcl.enum]), inline function with external linkage ([dcl.fct.spec]), class template (Clause [temp]), non-static function template ([temp.fct]), static data member of a class template ([temp.static]), member function of a class template ([temp.mem.func]), or template specialization for which some template parameters are not specified ([temp.spec], [temp.class.spec]) in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. [...]

- C++14 Standard, [basic.def.odr] p6

Multiple inclusions of <vector> within the same translation unit are expressly permitted and effectively elided, more than likely by "#ifndef" header guards.

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 1
    I see... But what about code duplication?? vector class contains code (well, templatized and abstract enough but still code...). So it would generate code for each inclusion in each translation unit where I use it, right? So, even if I use std::vector everywhere, very identical code should be generated separately for each module... that looks unoptimal... – barney Dec 31 '15 at 23:34
  • 2
    @barney: Yes, it is sub-optimal. C++'s compilation model (mostly inherited from C) has its flaws, and the addition of templates made them worse. That's one of the big reasons that C++ compilation is seen as being so remarkably slow — it has to parse each definition for _every_ compilation unit. And then it has to resolve all those duplicates and elide them at link time. Nobody's saying that this is the best way it can be done, only that it is the way C++ does it. :) – Lightness Races in Orbit Dec 31 '15 at 23:35
  • @barney: The linker removes all the duplicates during the link. – Zan Lynx Dec 31 '15 at 23:36
  • 2
    The compiler will in fact generate a `vector` implementation in every compilation unit where it's needed. Sometimes that means inline code (most methods in `vector` are quite lightweight) and other times it means standalone functions. The linker will sort it out and elide any duplicate copies of standalone functions. – StilesCrisis Dec 31 '15 at 23:36
  • @LightnessRacesinOrbit They keep trying to come up with a workable "modules" implementation...no luck yet. – Zan Lynx Dec 31 '15 at 23:36
  • @barney: the linker is supposed to eliminate those duplicates, so that if you take an address `&f` where `f` is a class member function of some template like `vector` or something, it should give the same address no matter what compilation unit you are in – Chris Beck Dec 31 '15 at 23:36
  • I see... so the compiler makes (theoretically) redundant code and linker removes all the duplicates, right? Yes, translation modules would be awesome feature to resolve it. Anyway I love c++ for its zero cost abstractions approach! ) – barney Dec 31 '15 at 23:39
  • @LightnessRacesinOrbit btw if templates are not classes but "instructions for compiler to generate classes's code" why not to add full fledged meta-code there? current template syntax is weird and cryptic (especially overloaded/templated/specialized versions matching and type traits tricks). Woudn't it be nice to have syntax for compilation time runable c++ code that will explicitly define all template generation rules in source code. Maybe crazy idea :) maybe not C++ but special DSL language for code generation. :) – barney Jan 01 '16 at 00:01
  • 3
    @barney: Well then that would be a different language wouldn't it – Lightness Races in Orbit Jan 01 '16 at 00:13
  • Formally, the last sentence is not 100% correct. Multiple inclusion of the same standard header in the same translation unit is covered by § 17.6.2.2/2, which says: *"Each may be included more than once, with no effect different from being included exactly once (...)"*. Header guards are but an *implementation* of this rule :) – Christian Hackl Jan 01 '16 at 13:39
  • @ChristianHackl: Granted – Lightness Races in Orbit Jan 01 '16 at 13:39
7

The standard has a special exception for templates that allows for duplication of functions that otherwise would violate ODR (such as functions with external linkage and non-inline member functions). from C++11 3.2/5:

If D is a template and is defined in more than one translation unit, then the preceding requirements shall apply both to names from the template’s enclosing scope used in the template definition (14.6.3), and also to dependent names at the point of instantiation (14.6.2). If the definitions of D satisfy all these requirements, then the program shall behave as if there were a single definition of D. If the definitions of D do not satisfy these requirements, then the behavior is undefined.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
1

The ODR doesn't state that a struct will only be declared one time across all compilation units--it states that if you declare a struct in multiple compilation units, it has to be the same struct. Violating the ODR would be if you had two separate vector types with the same name but different contents. At that point the linker would get confused and you'd get mixed up code and/or errors.

StilesCrisis
  • 15,972
  • 4
  • 39
  • 62
0

The ODR is relaxed for templates

The ODR is "relaxed" for templates and for inline functions/variables. It is possible for a template to appear in multiple translation units, such as:

// a.cpp
template <int N> int foo() { return N; }
// b.cpp
template <int N> int foo() { return N; }

Normally, this is not the result of copy/paste, but a consequence of including headers. The relevant wording is in [basic.def.odr] p14:

For any definable item D with definitions in multiple translation units,

  • if D is a non-inline non-templated function or variable, or
  • if the definitions in different translation units do not satisfy the following requirements,

the program is ill-formed; [...]

ODR violations are still possible

Note that the definition can appear in multiple translation units without violating the ODR, but this definition needs to be identical everywhere.

// a.cpp
template <int N> int foo() { return N; }
// b.cpp
template <int N> int foo() { return 0; } // IFNDR

The program is ill-formed, no diagnostic required because the definitions are not the same. This is also why it's important to use headers; it makes sure that the same symbols are copied/pasted into every translation unit.

What about code duplication?

This issue is resolved by the linker. A program would be ill-formed if any of the definitions weren't the same. It is certain that foo<0> in a.cpp and foo<0> in b.cpp must be exactly the same. As a result, the linker can arbitrarily pick one of the two, and remove the other from the executable.

This feature is also called weak symbols.

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96