15

When it comes to functions (non-member functions in C++), marking them as static gives them internal linkage. This means that they are not visible outside the translation unit. Why isn't this the default? I don't have a good statistic but from what I've seen most functions in implementation files should be marked as static.

I believe the consensus is to split functionality into smaller units. So it makes sense that in general the number of "utility"-like functions in the implementation file that should not be visible in other translation units is greater than the number of functions that just are an implementation of the public interface.

What is the reason as to why they went with "export everything" by default in this context?

Tagor
  • 937
  • 10
  • 30
  • 4
    C and C++ are two different languages. C++ has destructors and the [rule of five](https://en.cppreference.com/w/cpp/language/rule_of_three). C code uses some conventions (e.g. like in [GTK](https://gtk.org/)...). C or C++ code is sometimes generated. – Basile Starynkevitch Aug 24 '23 at 07:58
  • 3
    For questions asking about specifics of a language, please try to limit it to a single language. C and C++ are two very different languages, where even things that might seem similar can differ in the specifications. – Some programmer dude Aug 24 '23 at 07:58
  • 11
    @BasileStarynkevitch This question is valid for both C and C++. The double tag is understandable. – YSC Aug 24 '23 at 07:59
  • 5
    As for the reason why, you probably have to look way back into the history of C, long before it was standardized. I doubt anyone alive could give a definitive answer for the "why". – Some programmer dude Aug 24 '23 at 07:59
  • 2
    .h files would be full of `extern` with almost all entries. – 273K Aug 24 '23 at 08:02
  • 1
    @Some programmer dude static 'free' functions works the same way in both languages i think (just put 'free' to distinguish between member functions in c++) – Tagor Aug 24 '23 at 08:02
  • 4
    Actually, neither C nor C++ have anything called "free" functions. C only have "functions". C++ have "member function" (might be named something else) or "namespace scope functions" (non-member functions). The term "free" is ambiguous when it comes to functions. – Some programmer dude Aug 24 '23 at 08:03
  • 3
    I don't know where you've been looking, but most free functions in implementation files are actually meant to be used from the "outside". Which is why that is the default. – molbdnilo Aug 24 '23 at 08:05
  • 4
    Peeking at K&R 1st edition I would _guess_ the reason is that back in the days, functions didn't have storage class specifiers. This feature seems to have been added at some point later on. In which case backwards-compatibility would be the reason not to make `static` the default storage. – Lundin Aug 24 '23 at 08:06
  • 3
    @Someprogrammerdude *"term "free" is ambiguous"* Usually it means a non-member function. What else could it mean? – HolyBlackCat Aug 24 '23 at 08:08
  • 5
    @HolyBlackCat The `free` function comes to mind. – Lundin Aug 24 '23 at 08:09
  • 3
    @HolyBlackCat The `free` function, as mentioned. Or just about any function that "free" a resource. – Some programmer dude Aug 24 '23 at 08:12
  • 2
    Because B, Dennis Ritchie, ... Early C compilers were wildly different about what `static` actually meant. – user207421 Aug 24 '23 at 08:49
  • 3
    @Someprogrammerdude I've seen a funchion once in production code that was named `free_gadget` that actually allocated a gadget. Not just any gadget, but a special type of gadget called "free gadget". – n. m. could be an AI Aug 24 '23 at 09:00
  • 1
    @n.m.couldbeanAI It could "borrow" from the JSON-C library and free the memory with `put_free_gadget()`. – Andrew Henle Aug 24 '23 at 21:35
  • 1
    It's worth noticing that the keyword `static` appears just once in the Lyons-book source of 6th edition UNIX, and that is for a static variable local to a function. There are no functions defined as static. I wonder if the linker at the time could support such things. – Mike Spivey Aug 25 '23 at 12:39
  • 1
    "most functions in implementation files should be marked as static" What's the non-opinionated reason for this assertion? – Adrian McCarthy Aug 27 '23 at 00:25
  • 1
    @AdrianMcCarthy I don't have a non opinionated reason for the assertion really. Its just that the big modules I've seen tend to have more static functions in the implementation file. And it seems to align with modern coding standards. I believe the assertion being wrong/opinionated does not invalidate the question. – Tagor Aug 28 '23 at 10:38

6 Answers6

15

In the C/C++ compilation model, the preprocessor runs before everything else, and replaces #includes with their contents.

Hence, there's no difference between a function that's defined in a .cpp file and a function defined in a header it includes.

Your suggestion would make functions defined in headers static by default (which would remove the "mulitple definition" linking error), which would be very bad, as it would cause silent code duplication in the resulting binary if you forget inline (in C++) or if you don't know you're not supposed to define functions in headers (in C).

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
  • 1
    This answer doesn't make sense in the context of the question. Question speaks of *implementation* files, which is more likely the actual source file, rather than header file. – user694733 Aug 24 '23 at 08:09
  • 1
    *Your suggestion would make functions defined in headers `static` by default *... Tagging function prototypes as `extern` would easily solve this problem. – chqrlie Aug 24 '23 at 08:13
  • 11
    @user694733 I addressed this in the second paragraph. It's impossible to distinguish between a function defined in a .cpp file and defined in a header it includes. – HolyBlackCat Aug 24 '23 at 08:15
  • 2
    @chqrlie I don't understand. In the scenario I'm describing, there's no prototype at all. Imagine I add following to a header: `void foo() {std::cout << "Hello!\n";}`. Under current rules, this is a linking error. If you make it implicitly `static` (there's no prototype), this is no longer an error, but rather a silent code duplication in the binary. – HolyBlackCat Aug 24 '23 at 08:17
  • 2
    But is this the actual reason designers of C chose the external linkage as default? If not, then I don't think this answers the question *"Why?"*. – user694733 Aug 24 '23 at 08:31
  • 4
    Early programming languages, including early versions of C, did not have different types of linkages. Every identifier outside a function was linked together. That is now called external linkage. Internal linkage was invented later, and the default was already for identifiers declared outside functions to have external linkage. – Eric Postpischil Aug 24 '23 at 10:56
  • 1
    Functions aren't usually defined in header files in the first place. Header files are only supposed to contain declarations, not definitions. The exception is inline functions, but these don't require external linkage. – Barmar Aug 25 '23 at 15:31
  • 1
    @Barmar Yep, I didn't say otherwise. Not having `static` by default helps diagnose this issue if somebody accidentally does this. – HolyBlackCat Aug 25 '23 at 16:15
  • 1
    I don't think the original C designers were very much interested in protecting programmers from themselves. If they were, they wouldn't have made buffer overflows so easy and undetectable. – Barmar Aug 25 '23 at 16:18
  • 1
    @Barmar Mhm, I don't know if it's "the" original reason, but IMO asking about the original reason isn't a very interesting question. So instead I'm just answering why it would suck otherwise. – HolyBlackCat Aug 25 '23 at 16:38
  • 1
    Who else do you think "they" is in "why they went with". Surely not the standardization committee, they were just codifying the existing language and they wouldn't have changed such a fundamental feature. – Barmar Aug 25 '23 at 16:40
10
  1. C part

    C is now a very old language (from the 1970s...) and is highly conservative. Include files are just meant to be included at the source level. Draft n1570 for C11 explicitly says:

    A source file together with all the headers and source files included via the preprocessing directive #include is known as a preprocessing translation unit. After preprocessing, a preprocessing translation unit is called a translation unit.

    That means that a conformant C compiler does not make any difference between what comes from an include and a source file, since the inclusion occurred before the compilation phase.

    This is enough for functions to receive an external linkage by default (not being declared as static).

  2. C++ part

    Despite being a totally different language, C++ still assumes its inheritance from C. Specifically, the C standard library is still officially a part of the C++ standard library.

    This is probably enough for non-member function to receive the same processing by default as what they receive in C. This is of course far less important that in the C language, because C functions are actually declared as extern C. But on the other hand, non-member functions are also called namespace scoped function for a reason. And in C++, scoping is the correct way to handle the namespace pollution.

My opinion is that best practices should recommend to scope everything. You just use a named scope to get an external linkage and an anonymous one to limit scoping to the local unit. That is enough to not require changing the C default for non-member functions.

Den-Jason
  • 2,395
  • 1
  • 22
  • 17
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • 1
    @BenVoigt: Not true since C++11; `namespace {...}` grants the same linkage as `static` (but can apply it to types as well as functions/variables). – Davis Herring Aug 25 '23 at 07:41
  • 1
    It has never been common for functions to be defined in C header files, so why do you think that's the original reason for this default? – Barmar Aug 25 '23 at 15:34
  • 1
    @Barmar I was thinking of function *declarations* which *are* common in include files. And a function definition is just a declaration which happens to contain the function body. Probably simpler at the compiler level to have same defaults in a pure declaration and in the declaration part of a definition. – Serge Ballesta Aug 25 '23 at 17:24
  • 1
    @Barmar at the point of C creation local linkage was very uncommon. Name visibility in existing languages and asm mnemonics was global (on high level pascal had some segregation). This way, I think that it's a rule of least surprise. Header file was an extension of functionality, we could do without them – Swift - Friday Pie Aug 28 '23 at 18:12
6

You would be hard pressed to find out why the default is "export everything". The language and its compilers have both evolved dramatically since its inception in the 1970s, where there are no release notes nor "working group" discussions available on the internet. "Structured programming" and goto statements were of the time; very few people were thinking about using encapsulation to minimise the shared-state complexity problem. Fortran also made functions publicly visible.

I would surmise that as the language grew in popularity, ever larger systems emerged which may have broken early editions of the linker. So some means of circumventing this needed to be introduced. For some crazy reason they chose to use static to hide functions from the linker to reduce its load (for me this is a bigger mystery as opposed to why linkage is arbitrarily public).


Practically, when declaring functions static, aside from denying other modules access to "internals", it's worth hiding symbols from the linker in very large programs in order to to speed up the build time and reduce memory consumption. This can become unwieldy very quickly. Instead of sprinkling the codebase with static to conceal methods, it actually makes more sense to use a compiler option to set hidden visibility as the default, then decorate the functions you do want to be visible to other modules.

In Linux you can direct the compiler to make hidden visibility the default (-fvisibility=hidden): see https://stackoverflow.com/a/52742992/1607937

In truth it's a little bit more complicated than that; there are other options that provide finer tuning of visibility. From https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Function-Attributes.html:

visibility ("visibility_type") The visibility attribute on ELF targets causes the declaration to be emitted with default, hidden, protected or internal visibility.

          void __attribute__ ((visibility ("protected")))
          f () { /* Do something. */; }
          int i __attribute__ ((visibility ("hidden")));

See the ELF gABI for complete details, but the short story is:

default

Default visibility is the normal case for ELF. This value is available for the visibility attribute to override other options that may change the assumed visibility of symbols.

hidden

Hidden visibility indicates that the symbol will not be placed into the dynamic symbol table, so no other module (executable or shared library) can reference it directly.

internal

Internal visibility is like hidden visibility, but with additional processor specific semantics. Unless otherwise specified by the psABI, GCC defines internal visibility to mean that the function is never called from another module. Note that hidden symbols, while they cannot be referenced directly by other modules, can be referenced indirectly via function pointers.

By indicating that a symbol cannot be called from outside the module, GCC may for instance omit the load of a PIC register since it is known that the calling function loaded the correct value.

protected

Protected visibility indicates that the symbol will be placed in the dynamic symbol table, but that references within the defining module will bind to the local symbol. That is, the symbol cannot be overridden by another module.

Not all ELF targets support this attribute.

(also see Peter Cordes' comment in the thread)


Also note that functions can be overridden by "bolt-on" implementations that can be linked in. This is useful for mocking methods in unit tests. It's worth using the "weak linkage" attribute if you intend to use this.


It's worth mentioning that in C++ it is preferred to use anonymous namespaces instead of static to declare symbols as being "private":

namespace {
    <module-private code>
} // anonymous namespace

See core guidelines SF.22 - https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rs-unnamed2

In my experience, many companies embrace this in their coding standards.

Note that is not exactly equivalent to "static":

static and anonymous namespace are not the same thing. A function defined in an anonymous namespace will have external linkage. But it is guaranteed to live in a uniquely named scope. Indeed we can't refer to it outside of the translation unit it is defined in because it is unnamed.

... so for very large C++ programs it's still worth using -fvisibility=hidden and decorate the methods you do want to be visible to the linker, even with using anonymous namespaces.

Den-Jason
  • 2,395
  • 1
  • 22
  • 17
  • 6
    This appears to be a comment rather than an answer to the question. – Cubic Aug 24 '23 at 08:17
  • 7
    This doesn't answer the question. – Lundin Aug 24 '23 at 08:19
  • 1
    @Lundin the "-fvisibility=hidden" part does. – Den-Jason Aug 24 '23 at 08:27
  • 2
    `static` and anonymous namespace are not the same thing. A function defined in an anonymous namespace will have **external linkage**. But it is guaranteed to live in a uniquely named scope. Indeed we can't refer to it outside of the translation unit it is defined in because it is unnamed. – Fareanor Aug 24 '23 at 08:42
  • 1
    ELF gABI abd linakege in C++ are ortogonal. Static functions (or variables) may have no name at all on gABI level. Current implementation of anonymous namespace ones provides a a name though, it's visible for linker. – Swift - Friday Pie Aug 24 '23 at 08:56
  • 1
    10 edits later of this answer and the question to answer is still "**Why** doesn't free functions in implementation files have internal linkage by default". Not **how** to make that happen in various ways. – Lundin Aug 24 '23 at 09:21
  • 1
    @Lundin I state `I don't know why the default is "export everything"; I can only surmise this was the case in early versions of the compilers which then became a legacy.` Also I note you made a similar comment to that effect yourself. – Den-Jason Aug 24 '23 at 09:25
  • 2
    If you don't know then you probably shouldn't post an answer. I don't, so I didn't post an answer. I'm not sure if anyone actually knows, researching the history of C and design decisions along the way is never easy unless you come across some veteran who was actually there when it happened. – Lundin Aug 24 '23 at 09:32
  • 2
    ...in which case there should probably be _no_ answers to this question. – Den-Jason Aug 24 '23 at 09:34
  • 1
    Anyway, it's been useful to learn that anonymous namespaces are still visibile to the linker if compiled with visibility default. That would not have happened had I not posted this answer. – Den-Jason Aug 24 '23 at 09:37
  • 2
    `hidden`/`internal` are significantly different from `static`; ELF visibility is about visibility outside the final *shared library* you might link these `.o` files into. i.e. one linker output. While `static` is about visibility across single translation units (`.c` source files). If you're just compiling an executable, not a shared library, the visibility options make no difference. So yes, visibility is a problem across larger boundaries as well (between `.so` shared libraries), but your answer is written like it's the same as `static`, not making the distinction at all. – Peter Cordes Aug 24 '23 at 20:34
  • 1
    @PeterCordes feel free to make an edit as you see fit. – Den-Jason Aug 24 '23 at 21:29
  • 1
    @Fareanor: That was the C++03 rule; since then it's been changed to confer internal linkage just like `static`. – Davis Herring Aug 25 '23 at 07:37
  • 1
    @DavisHerring no it's not quite same? And I don't know any implementation that tereats it as such. Implicitly adding enclosing namespace by `using namespace` adds anonymous namespace for name lookup (not Koenig lookup though). – Swift - Friday Pie Aug 26 '23 at 12:21
  • 1
    @Swift-FridayPie: It [really is](https://en.cppreference.com/w/cpp/language/storage_duration#Internal_linkage); the implicit *using-directive* is real but seems unrelated. – Davis Herring Aug 26 '23 at 16:32
4

Functions with static keyword declared in global namespace scope would have local linkage. This mean, that

a) if they are declared in a .cpp file, they cannot be accessed from any other compilation unit (other .cpp file).

b) if they are declared in a header, there would be a copy of each function in every compilation unit which included that header file.

c) in they are declared in a module, they cannot be accessed from anywhere else.

Why the language is designed this way? It was original decision, both in C and C++. In C header files were a secondary, an optional item. You can link a program with zero header files in it. In C++ you would need prototypes of function to be declared in source code before use. In C you didn't need even that.

C++ uses same strategy. You could say it follows principle of least surprise. It would be unexpected for those functions to have local linkage by default and to require an "extern" keyword ( or "export", or some other extended abomination). In C++ anonymous namespaces act as a closest analog to "default local linkage".

Swift - Friday Pie
  • 12,777
  • 2
  • 19
  • 42
3

The compiler doesn't see "definition in source file, no declarations in header". All it sees is "definition in translation unit". Under your scheme you'd need to give external linkage to every function you intend to use in multiple translation units.

The default makes lots of sense for C, where there are only free functions, and that was kept in C++ for backwards compatibility.

Caleth
  • 52,200
  • 2
  • 44
  • 75
  • 1
    It doesn't really make sense in C either. External linkage has a downside that it pollutes global namespace, and makes function harder to inline. I don't know if it's conscious desing decision or not, but with power of hindsight, it's definitely a mistake. – user694733 Aug 24 '23 at 08:17
  • 5
    @user694733 see molbdnilo's comment on your question: historically, a large proportion of functions are intended to be used from outside the translation unit they are defined in. – Caleth Aug 24 '23 at 08:35
3

If it will be desirable to let programmers declare two slightly different forms of a construct, using a syntactic marker to distinguish an "alternate" form from a primary form, there two at least two sensible ways one could decide which form should be the primary form:

  1. If one form will be used more than the other, make that the primary form.

  2. If the language would be useless without one form, but would be at least somewhat usable without the other, make the first one the primary form.

If one is trying to minimize the amount of effort required to "bootstrap" a compiler onto a new platform, one should seek to omit things that aren't absolutely necessary to get a minimal compiler up and running. If a compiler will be generating code that needs to interact with any other code that's already up and running, support for external linkage will be absolutely required. Support for internal linkage may be nice, but far less necessary.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1
    The OP asserts that internal linkage should be the norm. I'm not saying I agree with them, but in light of these criteria, perhaps that's a key point to address? – John Bollinger Aug 25 '23 at 22:01
  • 1
    A compiler which didn't support any kind of external linkage would be rather useless, but I guess I forgot to mention that. – supercat Aug 25 '23 at 22:02
  • 1
    I agree, @supercat, and that was clear to me. My point is that that leaves your two criteria at odds with each other according to the OP. – John Bollinger Aug 25 '23 at 22:10
  • 1
    @JohnBollinger: The fact that the criteria would call for different actions means that one's choice of action depends upon which criterion one views as more important. Presumably, the second was more important in the design of C. – supercat Aug 25 '23 at 22:14
  • 1
    Ok, fair enough. – John Bollinger Aug 25 '23 at 22:16