There appear to be two questions here:
- Does casting a function pointer to another function pointer type and calling the result trigger undefined behavior?
- How can you write
objc_msgSend()
such that you can pass it any number of arguments and expect the correct return type, arbitrarily?
Undefined Behavior
For the first: I started fleshing out this part of the answer by referencing the C11 draft standard (the finalized C standard documents are behind a paywall, but the published draft docs are functionally identical), but as I'm not a language-lawyer, I'm not entirely confident in answering this part of the question to your satisfaction.
The relevant parts of the standards doc to reference:
If you squint just right, you might be able to read "that the function has no parameters" as equivalent to having "an empty parameter list" in some sense, in which case, it can be safely passed any number of arguments since it doesn't specify any. (Somewhat intuitively: the risk in casting between incompatible function pointer types is that you read memory for an argument as if it were of another type, which is invalid. If a function declares that it doesn't accept any parameters, then it claims that it will never read any values passed to it, so the compiler can safely assume that it can pass in any arguments it wants because they'll never be used. In practice, of course, the function can do whatever it wants.)
The return value aspect is a bit tougher to explain, hence my hesitance. §6.2.7 describes compatibility between types, but it doesn't mention void
in any way, and is otherwise pretty vague. From elsewhere
§6.2.5¶1
At various points within a translation unit an object type may be incomplete (lacking sufficient information to determine the size of objects of that type)
§6.2.5¶15
The void
type comprises an empty set of values; it is an incomplete object type that cannot be completed.
So void
is an "incomplete" type, which may just have arbitrary size and alignment (and can never be known) — but it doesn't appear to be explicitly stated anywhere that incomplete types and complete types (or void
) are incompatible. (For the most part, "incomplete" types largely just mean that the compiler just isn't aware of their definition, and can't help you prevent invalid casts or alignments; I'm not aware of stricter requirements on such types.)
The C standard is full of holes like this, where behavior can be somewhat sneakily gleaned not by what is said, but by what is left out. Someone with more experience than me in this area may be able to point to something in the standard which refutes this explicitly, but effectively, it appears that the standard implicitly leaves some leeway in expected behavior to allow this to be valid.
Writing objc_msgSend()
How could a C … function be written …?
Here's the trick: objc_msgSend
is necessarily written in assembly because it cannot possibly be written in C. It's not even really a function in the way that you might expect.
The purpose of objc_msgSend
is to take the arbitrary arguments it's given, find the pointer to the method with the given selector name for the receiver, and pass those arguments along exactly. In C, you can't do this, because C functions set up stack frames, and have to preserve certain registers and stack values; setting up a stack frame also means that the method you call has to return back to objc_msgSend
itself when it return
s, and the stack frame has to be torn down. This is both a lot of wasted work, and it means that your stack trace is littered with objc_msgSend
references all over the place, which is wasteful. Directly writing this in assembly allows these limitations to be bypassed.
Mike Ash goes into objc_msgSend
in significantly more detail in several articles on his blog[1][2], but the gist:
objc_msgSend
is exposed as a C function, but its implementation is in assembly
- When called from C, the stack and registers are set up by the caller exactly how the recipient method expects to receive them, because it appears to have a regular C calling convention
objc_msgSend
itself doesn't touch any of the registers or the stack, and doesn't set up a stack frame or modify the return address; it simply finds the correct function pointer to pass exection off to itself based on the recipient object and the method name
- When the method is then called, because
objc_msgSend
hasn't touched any registers or the stack, it appears that the method was called directly, without objc_msgSend
ever having been there. And because objc_msgSend
hasn't modified the return pointer for the method, execution returns back directly to the caller of objc_msgSend
, who can then safely read the return values off the stack because they received them directly from the called method
Because you have to cast objc_msgSend
's type to actually call it from C, if you've got the types right, the compiler will correctly set up the arguments to the method and also read the return value for you, all correctly.