140

I had an interview recently and one question asked was what is the use of extern "C" in C++ code. I replied that it is to use C functions in C++ code as C doesn't use name-mangling. I was asked why C doesn't use name-mangling and to be honest I couldn't answer.

I understand that when the C++ compiler compiles functions, it gives a special name to the function mainly because we can have overloaded functions of the same name in C++ which must be resolved at compile time. In C, the name of the function will stay the same, or maybe with an _ before it.

My query is: what's wrong with allowing the C++ compiler to mangle C functions also? I would have assumed that it doesn't matter what names the compiler gives to them. We call functions in the same way in C and C++.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Engineer999
  • 3,683
  • 6
  • 33
  • 71
  • 77
    C doesn't *need* to mangle the names, because it doesn't have function overloading. – EOF Apr 14 '16 at 11:36
  • 9
    How do you link C libraries with C++ code if the C++ compiler mangles the function names? – Mat Apr 14 '16 at 11:37
  • but what's wrong with letting the compiler just mangle them anyways? Why would it affect our program? We wouldn't need extern "C" then. – Engineer999 Apr 14 '16 at 11:38
  • 6
    " I replied that it is to use C functions in C++ code as C doesn't use name-mangling." - I think it is the other way around. Extern "C" makes the C++ functions usable in a C compiler. [source](http://stackoverflow.com/questions/1041866/in-c-source-what-is-the-effect-of-extern-c) – rozina Apr 14 '16 at 11:40
  • 3
    @Engineer999: And if you compile the subset of C that is also C++ with a C++ compiler, the function names will indeed get mangled. But if you want to be able to link binaries created with different compilers, you don't want name mangling. – EOF Apr 14 '16 at 11:40
  • Does the C++ compiler mangle the function names even on declaration? – Engineer999 Apr 14 '16 at 11:41
  • @Engineer999: Yes. That's in fact the whole point here. The definition is in a C library, compiled by a C compiler. The C++ compiler sees only the `extern "C"` declaration. – MSalters Apr 14 '16 at 11:45
  • But if I got the source code for the C libraries and compiled my application along with the library sources together with a C++ compiler, there should be no problem with name-mangling right? – Engineer999 Apr 14 '16 at 11:50
  • the mangle usually happens in order to allow function overloading. since C doesn't allow that anyway, there is not point of mangling C functions. – David Haim Apr 14 '16 at 11:51
  • 2
    As a side note to all of this, C compilers sometimes create internal functions out of your code. The reason could be optimization, for example it wants to optimize a switch statement by replacing it with an array of function pointers. Another reason could be that you are writing C code for something the hardware doesn't support, that is for example using 32 bit integers on a 8 bit CPU, or using float numbers on a system without FPU etc etc. The compiler will inject functions in your code, that are given some cryptic names. And these internal functions could be name mangled among themselves. – Lundin Apr 14 '16 at 11:53
  • What about linking in C++ compiled libraries? When the compiler is stepping through and compiling our code which calls one of the functions in a c++ compiled library, how does it know which name to mangle or give to the function on just seeing its declaration or function call? – Engineer999 Apr 14 '16 at 12:02
  • You may as well ask 'why does C not have a string type with creedence?' – Martin James Apr 14 '16 at 12:25
  • 13
    C **does** mangle names. Typically the mangled name is the name of the function preceded by an underscore. Sometimes it's the name of the function followed by an underscore. `extern "C"` says to mangle the name the same way that "the" C compiler would. – Pete Becker Apr 14 '16 at 13:13
  • @DavidHaim: I wonder if there would be any fundamental difficulty with C allowing overloading of inline functions? The compiler is allowed to name those however it likes, and anything that could be done with overloadable extern functions could be done by using inline functions which chain to differently-named extern functions. Being able to have a compiler select overloads could make some kinds of code more efficient, especially if overload selection could take constant arguments into account (e.g. allow a function `foo(int x, int y)` also have an inline overload `foo(int x, int 0)`... – supercat Apr 14 '16 at 14:38
  • ...which calls `foo_zero(x);`. While having `foo(int,int)` start with `if(y==0) foo_zero(x); else {...}` might work well when `y` is passed a compile-time constant zero, such code may be a waste of time when `y` is a variable. At present, though, the only way to make a compiler intelligently generate code in such cases is to use some ugly macros and non-standard intrinsitics. – supercat Apr 14 '16 at 14:43
  • 2
    It may be worth pointing out that C++ name mangling does not *just* allow overloading, it also provides type-safety. A C compiler could presumably also provide such type-safety, but I guess people are too entrenched in old ABIs by now – Arvid Apr 14 '16 at 15:06
  • 1
    `extern "C++"` really _ought_ to have been in C99, if only as an optional feature, but neither the C and C++ committees nor compiler vendors seem to be interested. – zwol Apr 14 '16 at 15:32
  • @Arvid: But if you added type safety to C, would it still be C? And how much existing (and yet to be written) code depends on not being type safe? – jamesqf Apr 14 '16 at 17:33
  • 2
    @Pete Becker, do you have an authoritative reference saying that C does mangling? I am not referring to adding the underscore, I mean something that uses the term "mangle" and calls it mangling, – Sam Hobbs Apr 14 '16 at 21:02
  • 1
    @Kaz: Forget about apes, why do humans with no dependents buy life insurance? (Other than having been pursuaded into it by a slick-talking salesperson, which IMHO explains a lot of C++ :-)) If you want name-mangling, type safety, and so on, just rename your source files to *.cpp, change the compile line in your Makefile, and you're pretty much good to go. – jamesqf Apr 14 '16 at 21:52
  • 1
    @PeteBecker *C does mangle names. Typically the mangled name is the name of the function preceded by an underscore.* No, **C** does *not* mangle names. [Microsoft does](https://en.wikipedia.org/wiki/Name_mangling#C_name_decoration_in_Microsoft_Windows) in order to specify the calling convention. – Andrew Henle Apr 16 '16 at 14:14
  • @supercat It's hard to treat `inline` functions in any special manner given that `inline` is implementation defined and *Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined.* I'd venture to guess that it's a bit hard to do anything definitive with that as a basis. – Andrew Henle Apr 16 '16 at 14:25
  • @AndrewHenle: I should have said "static" or "static inline", i.e. names which need not be exposed to outside code. – supercat Apr 17 '16 at 14:43
  • 1
    I would say that even prepending an underscore is a slight mangling. In the C compiler on Unix V7 this was done to avoid name collisions with the cpu registers when generating the assembly source files. The registers are named r0-r5, sp and pc. Currently many gnu assemblers seem to mangle the cpu registers, by naming them (very uglyly) %r0-%r5, %sp and %pc (etc). – Olaf Seibert Apr 18 '16 at 13:29

9 Answers9

193

It was sort of answered above, but I'll try to put things into context.

First, C came first. As such, what C does is, sort of, the "default". It does not mangle names because it just doesn't. A function name is a function name. A global is a global, and so on.

Then C++ came along. C++ wanted to be able to use the same linker as C, and to be able to link with code written in C. But C++ could not leave the C "mangling" (or, lack there of) as is. Check out the following example:

int function(int a);
int function();

In C++, these are distinct functions, with distinct bodies. If none of them are mangled, both will be called "function" (or "_function"), and the linker will complain about the redefinition of a symbol. C++ solution was to mangle the argument types into the function name. So, one is called _function_int and the other is called _function_void (not actual mangling scheme) and the collision is avoided.

Now we're left with a problem. If int function(int a) was defined in a C module, and we're merely taking its header (i.e. declaration) in C++ code and using it, the compiler will generate an instruction to the linker to import _function_int. When the function was defined, in the C module, it was not called that. It was called _function. This will cause a linker error.

To avoid that error, during the declaration of the function, we tell the compiler it is a function designed to be linked with, or compiled by, a C compiler:

extern "C" int function(int a);

The C++ compiler now knows to import _function rather than _function_int, and all is well.

David G
  • 94,763
  • 41
  • 167
  • 253
Shachar Shemesh
  • 8,193
  • 6
  • 25
  • 57
  • 1
    @ShacharShamesh : I've asked this elsewhere , but , what about linking in C++ compiled libraries? When the compiler is stepping through and compiling my code which calls one of the functions in a C++ compiled library, how does it know which name to mangle or give to the function on just seeing its declaration or function call? How to know that where it is defined, it is name-mangled to something else? So there must be a standard name-mangling method in C++? – Engineer999 Apr 14 '16 at 14:01
  • 2
    Every compiler does it in its own special way. If you're compiling everything with the same compiler it doesn't matter. But if you try to use, say, a library that was compiled with Borland's compiler, from a program that you're building with Microsoft's compiler, well... good luck; you'll need it :) – Mark VY Apr 14 '16 at 16:37
  • 9
    @Engineer999 Ever wondered why there is no such thing as portable C++ libraries, but they either specify exactly what version (and flags) of the compiler (and standard library) you have to use or just export a C API? There you go. C++ is pretty much the least portable language ever invented, while C is the exact opposite. There are efforts in that regard, but for now if you want something that's truly portable you'll stick with C. – Voo Apr 14 '16 at 17:42
  • 1
    @Voo Well, in theory you should be able to write portable code just by adhering to the standard e.g. `-std=c++11`, and avoid the use of anything outside the standard. That's the same as declaring a Java version (although newer Java versions are backward compatible). It's not the standards fault people use compiler specific extensions and platform dependent code. On the other hand, you can't blame them, as there are a lot of things (esp. IO, like sockets) missing in the standard. The committee looks to be slowly catching up to that. Correct me if I missed something. – mucaho Apr 14 '16 at 23:09
  • 16
    @mucaho: you're talking about source portability / compatibility. i.e. the API. Voo is talking about *binary* compatibility, without a re-compile. This requires **ABI compatibility**. C++ compilers regularly change their ABI between versions. (e.g. g++ doesn't even try to have a stable ABI. I assume they don't break the ABI just for fun, but they don't avoid changes that require an ABI change when there's something to be gained and no other good way to do it.). – Peter Cordes Apr 15 '16 at 03:36
  • 1
    There were attempts to create a stable C++ ABI, and, in theory, code compiled by gcc can be linked by LLVM and the Intel compiler. In practice, nobody knows whether that works, because nobody seems to trust the ABI enough to try in any large scale project. Add to that the fact that C++11 isn't ABI compatible with earlier versions, and there you have it. – Shachar Shemesh Apr 15 '16 at 11:29
45

It's not that they "can't", they aren't, in general.

If you want to call a function in a C library called foo(int x, const char *y), it's no good letting your C++ compiler mangle that into foo_I_cCP() (or whatever, just made up a mangling scheme on the spot here) just because it can.

That name won't resolve, the function is in C and its name does not depend on its list of argument types. So the C++ compiler has to know this, and mark that function as being C to avoid doing the mangling.

Remember that said C function might be in a library whose source code you don't have, all you have is the pre-compiled binary and the header. So your C++ compiler can't do "it's own thing", it can't change what's in the library after all.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • This is the part I am missing. Why would the C++ compiler mangle a function name when it sees its declaration just or sees it being called. Does it not just mangle function names when it sees their implementation? This would make more sense to me – Engineer999 Apr 14 '16 at 11:44
  • 13
    @Engineer999: How can you have one name for the definition and another for the declaration? _"There's a function called Brian that you can call." "Okay I'll call Brian." "Sorry, there is no function called Brian."_ Turns out it's called Graham. – Lightness Races in Orbit Apr 14 '16 at 11:49
  • What about linking in C++ compiled libraries? When the compiler is stepping through and compiling our code which calls one of the functions in a C++ compiled library, how does it know which name to mangle or give to the function on just seeing its declaration or function call? – Engineer999 Apr 14 '16 at 12:05
  • 1
    @Engineer999 Both must agree on the same mangling. So they see the header file (remember, there's very little metadata in native DLLs - headers are that metadata), and go "Ah, right, Brian should really be Graham". If this doesn't work (e.g. with two incompatible mangling schemes), you're not going to get a correct link and your application is going to fail. C++ has a lot of incompatibilities like this. In practice, you then have to explicitly use the mangled name and disable mangling on your side (e.g. you tell your code to execute Graham, not Brian). In *actual* practice... `extern "C"` :) – Luaan Apr 14 '16 at 13:26
  • 1
    @Engineer999 I might be wrong, but do you perhaps have experience with languages like Visual Basic, C# or Java (or even Pascal/Delphi to an extent)? Those make interop seem extremely simple. In C and especially C++, it's anything but. There's plenty of calling conventions you need to honor, you need to know who's responsible for what memory, and you must have the header files that tell you the function declarations, since the DLLs themselves don't contain enough information - especially in the case of pure C. If you don't have a header file, you generally need to decompile the DLL to use it. – Luaan Apr 14 '16 at 13:29
  • @Luaan I think that half of the problems you talk about are a result of bad code engineering, and for the other half the solutions are quite elegant already. – szpanczyk Jan 05 '21 at 09:48
32

what's wrong with allowing the C++ compiler to mangle C functions also?

They wouldn't be C functions any more.

A function is not just a signature and a definition; how a function works is largely determined by factors like the calling convention. The "Application Binary Interface" specified for use on your platform describes how systems talk to each other. The C++ ABI in use by your system specifies a name mangling scheme, so that programs on that system know how to invoke functions in libraries and so forth. (Read the C++ Itanium ABI for a great example. You'll very quickly see why it's necessary.)

The same applies for the C ABI on your system. Some C ABIs do actually have a name mangling scheme (e.g. Visual Studio), so this is less about "turning off name mangling" and more about switching from the C++ ABI to the C ABI, for certain functions. We mark C functions as being C functions, to which the C ABI (rather than the C++ ABI) is pertinent. The declaration must match the definition (be it in the same project or in some third-party library), otherwise the declaration is pointless. Without that, your system simply won't know how to locate/invoke those functions.

As for why platforms don't define C and C++ ABIs to be the same and get rid of this "problem", that's partially historical — the original C ABIs weren't sufficient for C++, which has namespaces, classes and operator overloading, all of which need to somehow be represented in a symbol's name in a computer-friendly manner — but one might also argue that making C programs now abide by the C++ is unfair on the C community, which would have to put up with a massively more complicated ABI just for the sake of some other people who want interoperability.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 2
    `+int(PI/3)`, but with one grain of salt: I'd be very cautious to speak of "C++ ABI"... AFAIK, there are *attempts* at defining C++ ABIs, but no **real** *de facto* / *de jure* standards - as https://isocpp.org/files/papers/n4028.pdf states (and I wholeheartedly agree), quote, *it is deeply ironic that C++ actually has always supported a way to publish an API with a stable binary ABI—by resorting to the C subset of C++ via extern “C”.*. `C++ Itanium ABI` is just that - *some* C++ ABI for Itanium... as discussed on http://stackoverflow.com/questions/7492180/c-abi-issues-list –  Apr 14 '16 at 16:06
  • 3
    @vaxquis: Yeah, not "C++'s ABI", but "a C++ ABI" in the same way that I have a "house key" that doesn't work on every house. Guess it could be clearer, though I tried to make it as clear as possible by starting off with the phrase _"The C++ ABI **in use by your system**"_. I dropped the clarifier in later utterances for brevity, but I'll accept an edit that reduces confusion here! – Lightness Races in Orbit Apr 14 '16 at 17:00
  • 2
    AIUI C abi's tended to be a property of a platform while C++ ABIs tended to be a property of an individual compiler and often even a property of an individual version of a compiler. So if you wanted to link between modules built with different vendors tools you had to use a C abi for the interface. – plugwash Apr 15 '16 at 02:13
  • The statement "name-mangled functions would not be C functions any more" is exaggerated -- it's perfectly possible to call name-mangled functions from plain vanilla C if the mangled name is known. That the name changes doesn't make it any less adherent to the C ABI, i.e doesn't make it any less a C function. The other way round makes more sense -- C++ code couldn't call a C function without declaring it "C" because it would do name mangling *when attempting to link against the callee.* – Peter - Reinstate Monica Apr 17 '16 at 12:06
  • @PeterA.Schneider: Yes, the headline phrase is exaggerated. The _entire rest of the answer_ contains the pertinent factual detail. – Lightness Races in Orbit Apr 17 '16 at 12:50
21

MSVC in fact does mangle C names, although in a simple fashion. It sometimes appends @4 or another small number. This relates to calling conventions and the need for stack cleanup.

So the premise is just flawed.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Yes, but this is done **only** for `__stdcall` and `__fastcall` functions which is **not a standard C calling** method (which remains the old glorious **"caller cleans the stack"**).. – Frankie_C Apr 14 '16 at 11:51
  • 2
    That's not really name mangling. It is simply a vendor specific naming (or name adorning) convention to prevent issues with executables being linked to DLLs built with the functions having different calling conventions. – Peter Apr 14 '16 at 11:56
  • 3
    What about prepending with a `_`? – OrangeDog Apr 14 '16 at 12:32
  • 13
    @Peter: Literally the same thing. – Lightness Races in Orbit Apr 14 '16 at 12:37
  • 7
    @Frankie_C: "Caller cleans the stack" is not specified by any C standard: neither calling convention is more standard than the other from a language perspective. – Ben Voigt Apr 14 '16 at 15:02
  • 2
    And from an MSVC perspective, the "standard calling convention" is just what you pick from `/Gd, /Gr, /Gv, /Gz`. (That is to say, the standard calling convention is what's used unless a function declaration explicitly specifies a calling convention.). You're thinking of `__cdecl` which is the default standard calling convention. – MSalters Apr 14 '16 at 15:16
  • You are everybody right. After so many years the real sense of original stack handling has been lost. I would have said the **original historic ABI**. In actual systems the standard calling convention is that relative to the platform. – Frankie_C Apr 15 '16 at 09:47
13

It's very common to have programs which are partially written in C and partially written in some other language (often assembly language, but sometimes Pascal, FORTRAN, or something else). It's also common to have programs contain different components written by different people who may not have the source code for everything.

On most platforms, there is a specification--often called an ABI [Application Binary Interface] which describes what a compiler must do to produce a function with a particular name which accepts arguments of some particular types and returns a value of some particular type. In some cases, an ABI may define more than one "calling convention"; compilers for such systems often provide a means of indicating which calling convention should be used for a particular function. For example, on the Macintosh, most Toolbox routines use the Pascal calling convention, so the prototype for something like "LineTo" would be something like:

/* Note that there are no underscores before the "pascal" keyword because
   the Toolbox was written in the early 1980s, before the Standard and its
   underscore convention were published */
pascal void LineTo(short x, short y);

If all of the code in a project was compiled using the same compiler, it wouldn't matter what name the compiler exported for each function, but in many situations it will be necessary for C code to call functions that were compiled using other tools and cannot be recompiled with the present compiler [and may very well not even be in C]. Being able to define the linker name is thus critical to the use of such functions.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Yes, that's the answer. If it is just C and C++ then it is difficult to understand why it is done that way. To understand we must put things in the context of the old way of statically linking. Static linking seems primitive to Windows programmers but it is the primary reason C **cannot** mangle names. – Sam Hobbs Apr 14 '16 at 21:08
  • 2
    @user34660: Not qutie. It's the reason that C cannot mandate the existence of features whose implementation would require either mangling exportable names, or allowing the existence of multiple like-named symbols that are distinguished by secondary characteristics. – supercat Apr 14 '16 at 23:11
  • do we know that there were attempts to "mandate" such things or that such things were extensions available for C before C++? – Sam Hobbs Apr 15 '16 at 00:34
  • @user34660: Re "Static linking seems primitive to Windows programmers...", but dynamic linking sometimes seems like a major PITA to people using Linux, when installing program X (probably written in C++) means having to track down and install particular versions of libraries that you already have different versions of on your system. – jamesqf Apr 16 '16 at 17:34
  • @jamesqf, yes, Unix did not have dynamic linking before Windows. I know very little about dynamic linking in Unix/Linux but it sounds like it is not as seamless as it could be in an operating system generally. – Sam Hobbs Apr 16 '16 at 18:08
  • @user34660: I don't know for sure, but I'm guessing it's more of a language problem than fundamental to the OS. See comments about C++ frequently changing its ABI. I don't recall ever having problems with C code. – jamesqf Apr 17 '16 at 05:05
13

I'll add one other answer, to address some of the tangential discussions that took place.

The C ABI (application binary interface) originally called for passing arguments on the stack in reverse order (i.e. - pushed from right to left), where the caller also frees the stack storage. Modern ABI actually uses registers for passing arguments, but many of the mangling considerations go back to that original stack argument passing.

The original Pascal ABI, in contrast, pushed the arguments from left to right, and the callee had to pop the arguments. The original C ABI is superior to the original Pascal ABI in two important points. The argument push order means that the stack offset of the first argument is always known, allowing functions that have an unknown number of arguments, where the early arguments control how many other arguments there are (ala printf).

The second way in which the C ABI is superior is the behavior in case the caller and callee do not agree on how many arguments there are. In the C case, so long as you don't actually access arguments past the last one, nothing bad happens. In Pascal, the wrong number of arguments is popped from the stack, and the entire stack is corrupted.

The original Windows 3.1 ABI was based on Pascal. As such, it used the Pascal ABI (arguments in left to right order, callee pops). Since any mismatch in argument number might lead to stack corruption, a mangling scheme was formed. Each function name was mangled with a number indicating the size, in bytes, of its arguments. So, on 16 bit machine, the following function (C syntax):

int function(int a)

Was mangled to function@2, because int is two bytes wide. This was done so that if the declaration and definition mismatch, the linker will fail to find the function rather than corrupt the stack at run time. Conversely, if the program links, then you can be sure the correct number of bytes is popped from the stack at the end of the call.

32 bit Windows and onward use the stdcall ABI instead. It is similar to the Pascal ABI, except push order is like in C, from right to left. Like the Pascal ABI, the name mangling mangles the arguments byte size into the function name to avoid stack corruption.

Unlike claims made elsewhere here, the C ABI does not mangle the function names, even on Visual Studio. Conversely, mangling functions decorated with the stdcall ABI specification isn't unique to VS. GCC also supports this ABI, even when compiling for Linux. This is used extensively by Wine, that uses it's own loader to allow run time linking of Linux compiled binaries to Windows compiled DLLs.

Shachar Shemesh
  • 8,193
  • 6
  • 25
  • 57
9

C++ compilers use name mangling in order to allow for unique symbol names for overloaded functions whose signature would otherwise be the same. It basically encodes the types of arguments as well, which allows for polymorphism on a function-based level.

C does not require this since it does not allow for the overloading of functions.

Note that name mangling is one (but certainly not the only!) reason that one cannot rely on a 'C++ ABI'.

dgrine
  • 702
  • 6
  • 15
8

C++ wants to be able to interop with C code that links against it, or that it links against.

C expects non-name-mangled function names.

If C++ mangled it, it would not find the exported non-mangled functions from C, or C would not find the functions C++ exported. The C linker must get the name it itself expects, because it does not know it is coming from or going to C++.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
3

Mangling the names of C functions and variables would allow their types to be checked at link time. Currently, all (?) C implementations allow you to define a variable in one file and call it as a function in another. Or you can declare a function with a wrong signature (e.g. void fopen(double) and then call it.

I proposed a scheme for the type-safe linkage of C variables and functions through the use of mangling back in 1991. The scheme was never adopted, because, as other have noted here, this would destroy backward compatibility.

Diomidis Spinellis
  • 18,734
  • 5
  • 61
  • 83
  • 1
    You mean "allow their types to be checked at *link* time". Types *are* checked at compile time, but linking with unmangled names cannot check whether the declarations used in the different compilation units agree. And if they don't agree, it's your build system that's fundamentally broken and needs to be fixed. – cmaster - reinstate monica Apr 17 '16 at 20:08