38

Suppose I have something along the lines of

struct Foo {
    void goo() {printf("Test");}
}

external void _ZN3Foo3gooEv(Foo *f);

int main() {
        Foo f;
        _ZN3Foo3gooEv(&f);
}

Is it possible to call Foo::goo() through the name mangled version of the function here?

Edit:

As a clarification, this is just an experiment to see if it's possible to explicitly call a name mangled function. There is no further goal here.

I was thought that all member functions basically take the this pointer as their first argument.

I get that this won't link, but I don't get why. I thought that name mangling happens at compile time, and when the linker runs it resolves the calls to the name mangled function. (That's why I figured if we leave _ZN3Foo3gooEv as extern, it would go to the symbol table to look it up).

Am I misunderstanding something here?

Zig Razor
  • 3,381
  • 2
  • 15
  • 35
Henry
  • 495
  • 3
  • 11
  • 8
    This sounds like it may be an instance of the XY problem. Would you be able to share more details about your end goal in doing this? – nanofarad Sep 18 '20 at 01:30
  • 4
    I suspect this isn't possible, because the C++ compiler will mangle `_ZN3Foo3gooEv` into something else, and then it won't match anything. – Mooing Duck Sep 18 '20 at 01:36
  • 1
    You can't call it correctly without an instance anyway, so even if it linked, which it won't, it would never work. Certainly an XY problem here. – user207421 Sep 18 '20 at 01:40
  • 12
    I don't have an end goal in doing this. This is a demo for knowledge's sake. Can you elaborate on what you mean by not being able to call goo without an instance? Isn't goo internally converted into some function that basically takes a pointer to a Foo object? (the this pointer) I assumed that f would serve as our instance here. – Henry Sep 18 '20 at 01:52
  • @MooingDuck when you say _ZN3Foo3gooEv , do you mean the extern function, or the Foo::goo()? – Henry Sep 18 '20 at 02:02
  • The only legal way to call `goo()` is via (1) `foo,goo()` or (2) `foo->goo()` where `foo` is (1) an instance or (2) a pointer to an instance of `Foo`. That's what is meant by providing an instance. – user207421 Sep 18 '20 at 02:30
  • @nanofarad - say you wanted to dynamically call C++ functions in a DLL and you didn't have a .lib file to link to. Or didn't want to link to a lib because you wanted your .exe to still run when the DLL was not present (obviously not calling functions in the non-present DLL). I had to do this years ago and could not crack this - I ended up writing a C to C++ layer that used the lib and dynamically calling that C layer from my exe. – Marc Bernier Sep 18 '20 at 14:10
  • 1
    @MarquisofLorne There's also `std::invoke`, which lets you pass the instance as the first parameter, specifically for working with member function pointers like this, and there's also the bizzare member function pointer syntax `(object).*(ptrToMember)`. – Mooing Duck Sep 18 '20 at 16:13
  • It is a good idea for using C++ libraries generated by gcc in msvc and vice versa. – Amir Saniyan Sep 25 '20 at 15:48

1 Answers1

37

You can, with some caveats.

You either have to use the member function in a way that code will be generated or have it be not inline, and your mangled definition should be extern "C" to prevent "double mangling". E.g.:

#include <cstdio>

struct Foo {
    const char* message;
    void goo();
};

void Foo::goo() {
    std::printf("%s", this->message);
}

extern "C" void _ZN3Foo3gooEv(Foo *f);

int main() {
        Foo f{ "Test" };
        _ZN3Foo3gooEv(&f);
}

will work fine and be stable specifically in gcc.

This works because the calling convention for member functions is equivalent to the default calling convention for free functions on most systems. this is passed to member functions as if it was the first argument, with explicit arguments taking the later arg-passing slots. (Registers and/or stack). I believe that this is true for x86-64, ARM 32-bit and 64-bit at least, and 32-bit x86 other than Windows.

clang seems to specifically support this use case: It inlines Foo::goo into main when gcc pretends that _ZN3Foo3gooEv and Foo::goo after mangling are two separate entities (and thus can't be substituted and inlined).

With MSVC, you can do something similar. However, in x86-32 code on windows, the calling convention __thiscall is used where instead of passing the this pointer as the first argument, it is passed in the ECX register with other args on the stack. If cross compiling for x86-32 with clang or gcc, you can use [[gnu::thiscall]] (__attribute__((thiscall))). (fastcall is similar if there's only one arg, but with 2 args would pass the first 2 in registers, not just the first 1).


But there really should be no reason to do this. It can only be viewed as a compiler extension (Since it uses _Capital symbols), and if you need a way to call these functions from C, use a helper void Foo_goo(struct Foo*) that you define in a C++ translation unit. It can also call private member functions, but you can already do this in a standards-compliant way with template specialisations.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Artyer
  • 31,034
  • 3
  • 47
  • 75
  • It will specifically *fail,* at `this->message`, as no value for `this` has been supplied. It is all UB. – user207421 Sep 18 '20 at 01:45
  • 10
    @MarquisofLorne There is no such thing as UB when you're targeting a compiler. – Passer By Sep 18 '20 at 01:55
  • 3
    @MarquisofLorne Ok, it's UB, so it's not valid C++ and you should never do this, but it "works" on these implementations anyway. A value for `this` *has* ("has") been supplied: it's `f`. A non-`virtual` method is, both conceptually and apparently implementationally, just a free function with an extra parameter for `this`. [Here's it not-failing on Clang and GCC](https://godbolt.org/z/cY516c). – HTNW Sep 18 '20 at 01:55
  • So I understand that the extern "C" here forces the compiler to call the mangled version of the function, but I don't understand why the explicit this pointer was necessary. Was it just being optimized away before? – Henry Sep 18 '20 at 02:10
  • 2
    @Henry I just put the `this` as an example to show that you can access data members. `extern "C"` disables name mangling (so it doesn't declare `_Z13_ZN3Foo3gooEvP3Foo`) – Artyer Sep 18 '20 at 02:12
  • @Artyer Thank you! This was a perfect explanation and solution the question I was asking. I understand what both the problems were. :) – Henry Sep 18 '20 at 02:17
  • 1
    It is UB to call a non-static instance method without an instance, and I don't care what compiler you're talking about. – user207421 Sep 18 '20 at 02:31
  • 14
    @MarquisofLorne UB is a standard construct. A compiler is a piece of software. A piece of software has exactly the behaviour it has, stating it is _undefined_ is categorically wrong, – Passer By Sep 18 '20 at 02:53
  • 2
    @MarquisofLorne The object is passed as `&f`. Though this isn't technically "calling a non-static member function", but using a compiler extension that specific names will mangle into specific symbols which are allowed to be called with C language linkage because the compiler provides an ABI that lets it. – Artyer Sep 18 '20 at 02:54
  • 1
    @MarquisofLorne The posted code calls a free function that takes a `Foo *` as argument, which is aliased to a member function via linker tricks. Granted, the latter step is a toolset-specific hack (as pointed out, and acknowledged), but it is outside the C++ language scope, so I wouldn't call it UB. The idea behind the trick is certainly well known, as a quick search for "*hidden `this` pointer*" can verify. – dxiv Sep 18 '20 at 02:54
  • 6
    @Artyer — `external “C”` does not disable name mangling. It says to mangle the name the same way a C compiler would. That often means pretending an underscore, so the declaration might have to leave out the leading `_`. – Pete Becker Sep 18 '20 at 03:08
  • 3
    @PasserBy — the behavior is undefined. That means only that the language definition doesn’t tell you what the program does. It doesn’t mean that bad things must happen. – Pete Becker Sep 18 '20 at 03:11
  • In the standard AArch64 calling convention, `this` isn't passed on the stack at all. Most arguments are passed in X0-X7 so `this` would be in X0. – Léo Lam Sep 18 '20 at 09:43
  • 1
    > But there really should be no reason to do this. -- well, almost. Writing ABI-compatibility layers with other compilers is one use case, but mostly found in highly legacy software or reversing :) – ljrk Sep 18 '20 at 10:12
  • 1
    @PasserBy: MSVC definitely has some UB. In particular, there are cases where it has multiple possible behaviors in the case of UB, and it might not even be consistent with itself when that happens. Literally, a single EXE may crash when you start it, and even without recompiling run on the next try. Stuff like ASLR can really wreck havoc in combination with UB; or in other words ASLR is possible because there's no need to care about programs with UB. – MSalters Sep 18 '20 at 11:13
  • 1
    @MSalters Fair point. But if you take it strictly, definition of behaviour simply means the constraints an implementation has, and undefined means lack thereof. It kinda devolves into a game of words at this point though. – Passer By Sep 18 '20 at 11:39
  • @PasserBy: It's not just a game of words; it is possible to use precise language and sort out the relevant concepts. 1. An implementation *can* define the behaviour of something that ISO C++ leaves undefined. e.g. `gcc -fwrapv` defines signed overflow as wrapping, otherwise it's still UB for GCC. 2. In a program that encounters UB, *something* will always happen. Sometimes it even happens to be what you wanted, e.g. some cases of signed overflow where the compiler didn't optimize based on the assumption it wouldn't happen. – Peter Cordes Sep 18 '20 at 16:30
  • 2
    PasserBy and @PasserBy: The question is whether GCC/clang officially (or de-facto in current versions) define and support this behaviour, or whether it's just a "happens to work" thing that could break with different surrounding code and/or compiler options. Specific C++ implementations 100% do still have UB. – Peter Cordes Sep 18 '20 at 16:31
  • @HTNW: The C++ Standard states "Although this document states only requirements on C++ implementations, those requirements are often easier to understand if they are phrased as requirements on programs, parts of programs, or execution of programs. Such requirements have the following meaning...." By my reading, that says that the Standard doesn't classify any programs as "invalid". All it does is indicate that some programs fall outside its jurisdiction, which would be true of most programs that perform tasks for which the Standard makes no provision. – supercat Sep 18 '20 at 18:48