27

I am aware of how ODR, linkage, static, and extern "C" work with functions. But I am not sure about visibility of types since they cannot be declared static and there are no anonymous namespaces in C.

In particular, I would like to know the validity of the following code if compiled as C and C++

// A.{c,cpp}
typedef struct foo_t{
    int x;
    int y;
} Foo;

static int use_foo() 
{ 
    Foo f;
    f.x=5;
    return f.x;
}
// B.{c,cpp}
typedef struct foo_t{
    double x;
} Foo;

static int use_foo() 
{ 
    Foo f;
    f.x=5.0;
    return f.x;// Cast on purpose
}

using the following two commands (I know both compilers autodetect the language based on extensions, hence the different names).

  • g++ -std=c++17 -pedantic -Wall -Wextra a.cpp b.cpp
  • gcc -std=c11 -pedantic -Wall -Wextra a.c b.c

Versions 8.3 happily compile both without any errors. Clearly, if both struct symbols have external linkage, there is ODR violation because the definitions are not identical. Yes, compiler is not required to report it, hence my question because neither did.

Is it valid C++ program?

I do not think so, that is what anonymous namespaces are for.

Is it valid C program?

I am not sure here, I have read that types are considered static which would make the program valid. Can someone please confirm?

C,C++ Compatibility

If these definitions were in public header files, perhaps in different C libraries, and a C++ program includes both, each also in a different TU, would that be ODR? How can one prevent this? Does extern "C" play any role?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Quimby
  • 17,735
  • 4
  • 35
  • 55
  • @KamilCuk Fair point, the symbols then. Sure, imagine they are used in appropriate ways to trigger the ODR rules. – Quimby Oct 20 '21 at 08:49
  • 1
    C++ types have linkage, C types don't. And if memory serves, C++20 introduced some changes that make `typedef { ... } bar;` types behave more like C types. – StoryTeller - Unslander Monica Oct 20 '21 at 08:55
  • @StoryTeller-UnslanderMonica, it's not that types have linkage. It's rather that default constructors/destructors have linkage what may cause problems. – tstanisl Oct 20 '21 at 08:56
  • 1
    @tstanisl - No, they **literally** have linkage https://timsong-cpp.github.io/cppwp/n4659/basic.link#2 – StoryTeller - Unslander Monica Oct 20 '21 at 08:57
  • Both types are incompatible but it does not trigger any UB in C until you have any code that depends on compatibility of those structures – tstanisl Oct 20 '21 at 08:58
  • [For C] If, say, an external variable is declared with the wrong type, but the type used in the declaration and definition share an identically named struct tag, then that is just a coincidence that the types share the same tag. The problem is not the tag; the problem is the incompatible types of the declaration and definition of the external variable. – Ian Abbott Oct 20 '21 at 08:59
  • I was writing an answer but now you changed the question completely, so I'm off, bye. – Lundin Oct 20 '21 at 09:01
  • Added a simple use case, I am not sure what is exactly needed to trigger the ODR rules. My point is whether this code is unsafe because there exists valid usage that violates ODR? – Quimby Oct 20 '21 at 09:01
  • @Lundin Sorry about that, I was asked to add a use case by KamilCuk. I can revert the changes if you want. Or just post the answer anyway. – Quimby Oct 20 '21 at 09:02
  • 2
    @Lundin, looking over the change history of the question, the changes seem minimal. So "changed the question completely" seems way exaggerated. – JHBonarius Oct 20 '21 at 13:43
  • 1
    @JHBonarius Speaking of linkage of _types_ doesn't make sense in C and previous C++ standards. Speaking of linkage of _objects_ has been there since the dawn of time in all these languages and versions. The question changed from speaking of the former to the latter. – Lundin Oct 20 '21 at 13:49

3 Answers3

19

I will use for references the n1570 draft for C11 for the C language and the draft n4860 for C++20 for the C++ language.

  1. C language

    Types have no linkage in C: 6.2.2 Linkages of identifiers §6:

    The following identifiers have no linkage: an identifier declared to be anything other than an object or a function...

    That means that the types used in a.c and b.c are unrelated: you correctly declare different objects in both compilation units.

  2. C++ language

    Types do have linkage in C++. 6.6 Program and linkage [basic.link] says (emphasize mine):

    • §2:

    A name is said to have linkage when it might denote the same object, reference, function, type, template, namespace or value as a name introduced by a declaration in another scope

    • §4

    An unnamed namespace or a namespace declared directly or indirectly within an unnamed namespace has internal linkage. All other namespaces have external linkage. A name having namespace scope that has not been given internal linkage above and that is the name of
    ...
    a named class...
    ...
    has its linkage determined as follows:
    — if the enclosing namespace has internal linkage, the name has internal linkage;
    — otherwise, if the declaration of the name is attached to a named module (10.1) and is not exported (10.2), the name has module linkage;
    — otherwise, the name has external linkage

    The types declared in a.cpp and b.cpp share the same identifier with external linkage and are not compatible: the program is ill-formed.


That being said, most common compiler are able to compile either C or C++ sources, and I would bet a coin that they try hard to share most of the implementation of both languages. For that reason, I would trust real world implementation to produce the expected resuls even for C++ language. But Undefined Behaviour does not forbid expected results...

MyStackRunnethOver
  • 4,872
  • 2
  • 28
  • 42
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • 1
    In practice C++'s types-have-linkage rule is probably because overloads would conflict. `void foo(bar)` would mangle the same way in different files, even if `bar` was defined differently in each translation unit. – Peter Cordes Oct 21 '21 at 10:30
  • 2
    @PeterCordes: long time I have not tried to guess the rationale for most C++ language decisions... – Serge Ballesta Oct 21 '21 at 12:20
  • Thank you for the detailed answer, this seems to be even better than the one I accepted. Could you please clarify whether `extern "C"` plays any role in this? Does a type declared inside `extern "C"` obey C rules and thus have no linkage? My concrete example would be two different `Foo` structs defined in separate headers, included in separate TUs, used separately, linked together. That is valid C, invalid C++, right? Can `extern "C"` fix that? – Quimby Oct 21 '21 at 14:35
  • @Quimby According to 9.11 Linkage specifications [dcl.link] (in n4860), `extern C` is used to declare *things* having the C language linkage. But is also says that language linkage is relevant for *function types, function names with external linkage, and variable names with external linkage*. Applying C rules for types declared in a specific language linkage bloc is therefore not mandated and could only be implementation dependant, as is support for other language linkage specification like Fortran or Ada... – Serge Ballesta Oct 21 '21 at 15:11
  • ... `extern C` is meant to allow interfacing with modules written in C language. It is not meant to *temporarily allow C semantics* in a C++ module. – Serge Ballesta Oct 21 '21 at 15:14
7

For C. The program is valid. The only requirement that applies here is "strict aliasing rule" saying that the object can be accessed only via a l-value of a compatible type (+ a few exception described in 6.5p7).

The compatibility of structures/unions defined in separate translation units is defined in 6.2.7p1.

... two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values.

Therefore the structures are not compatible in the example.

However, it is not an issue because the f object is created and accessed via locally defined type. UB would be invoked if the object was created with Foo type defined in one translation unit and accessed via other Foo type in the other translation unit:

// A.c
typedef struct foo_t{
    int x;
    int y;
} Foo;

void bar(void *f);

void foo() 
{ 
    Foo f;
    bar(&f);
}

// B.c
typedef struct foo_t{
    double x;
} Foo;

// using void* to avoid passing pointer to incompatible types
void bar(void *f_) 
{ 
    Foo *f = f_;
    f->x=5.0; // UB!
}
tstanisl
  • 13,520
  • 2
  • 25
  • 40
  • I don't think calling function is even necessary for the UB to happen. According to 6.2.7p2 even the declaration `void bar(Foo *f);` in A.c should be enough for UB. – user694733 Oct 20 '21 at 09:29
  • @user694733, all pointers to structures have the same representation. It's not UB yet. – tstanisl Oct 20 '21 at 09:30
  • Thank you, I understand the mismatch between the declaration of `bar` and its definition which uses different `Foo`. So the fact that two separate, different `Foo` structures co-exist in separate TUs is OK in C as long as you avoid this demonstrated mismatch? Because I strongly believe in C++ the mere (separately used) definitions violate ODR. [This answer](https://stackoverflow.com/a/9364985/7691729) directly quotes the Standard and in my case, the `Foo` violate the first condition too. – Quimby Oct 20 '21 at 09:30
  • @Quimby, I agree it is UB in C++ due to linkage of destructors. However, it may not be for Plain-Old-Data types, I don't feel fully competent to answer – tstanisl Oct 20 '21 at 09:34
  • @tstanisl I did not consider PODs, you might be right, still would not be safe in general. Thanks for the answer, I will accept as I was mostly curious about C. – Quimby Oct 20 '21 at 09:36
  • @tstanisl I don't think same representation matters; I don't think 2 pointers are *compatible*. (Though I can't find relevant reference in standard right now). If linker were a dumb one that would always include objects even when they are not used, it could stop, because it can't comprehend that these 2 function declarations are the same. – user694733 Oct 20 '21 at 09:39
  • @user694733, You might be right. Just in case, I'll change the type to `void*`. however it should be ok if `Foo` was incomplete when `bar` is declared.`typedef struct foo_t Foo; void bar(Foo *f); struct foo_t { ... }; ` – tstanisl Oct 20 '21 at 09:53
2

Other answers point out that this is an ill-formed program in C++.

In practice, link errors on overloaded functions would be possible if you have two separate definitions of (non-static) void foo(bar); in separate translation units. I expect this is (part of) why C++ has this rule that (some) types have external linkage.

If types were truly private, those wouldn't conflict. But they'll name-mangle the same way, because if both TUs do have the same definition of the type bar (e.g. via a .h or manual copying), they need to resolve to calling the same function.

// A.cpp
typedef struct foo{  // names ending with _t are reserved
    int x;
    int y;
} Foo;

int take_foo(Foo f) {
    return f.x;
}

int main(){}  // so it's linkable without special options like -nostdlib and linker entry-point defaults
// B.{c,cpp}
typedef struct foo{
    double x;
} Foo;

double take_foo(Foo f) {
    return f.x;
}

In case it matters, these functions will compile to different machine code on some targets, including x86-64 System V ABI where I tested it. (The first double arg is already in the return-value register, even if inside a struct containing only a couple doubles. But unlike ARM64 and some other RISCs, the first integer arg is not passed in the return-value register, so a mov is required before the ret.)

$ g++ [AB].cpp
/usr/bin/ld: /tmp/ccM89kvx.o: in function `take_foo(foo)':
B.cpp:(.text+0x0): multiple definition of `take_foo(foo)'; /tmp/cckZ5qRG.o:A.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

There's no error if the functions or the struct tags have different names. (And yes, I compiled with optimization disabled, and no link-time optimization, so nothing had a chance to remove unused functions before they conflicted.)

However, just changing the typedef name without changing the struct tag isn't sufficient. That makes sense; all typedefs for the same type need to resolve to the same asm name, so GCC mangles based on the struct tag even if you don't use it directly. Note the linker error messages demangling it back to take_foo(foo) not Foo.

I didn't go through the standard wording to see if two typedef ... Foo would be legal in ISO C++, despite not being a problem in practice for real-world C++ implementations.

Making either function static would fix the problem, too, because it's fine for static functions to have the same asm name.

This would also have a linker error if compiled as C, which doesn't have function overloading so it's already a problem to have two non-static take_foo functions in the same program regardless of their args being structs of the same tag-name or not.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847