14

Is is a follow-up to my previous question: What are the digits in an ObjC method type encoding string?

Say there is an encoding:

v24@0:4:8@12B16@20

How are those numbers calculated? B is a char so it should occupy just 1 byte (not 4 bytes). Does it have something to do with "alignment"? What is the size of void?

Is it correct to calculate the numbers as follows? Ask sizeof on every item and round up the result to multiple of 4? And the first number becomes the sum of all the other ones?

Community
  • 1
  • 1
Ecir Hana
  • 10,864
  • 13
  • 67
  • 117
  • You may want to poke around in [](http://opensource.apple.com/source/objc4/objc4-493.11/runtime/objc-typeencoding.m). – jscs Jul 17 '12 at 21:41
  • Interesting. But it's just for reading the codes (not for generating them), right? Or you probably meant to show how the codes are used...thanks! – Ecir Hana Jul 17 '12 at 21:52
  • Yes, but it gives some insight into how they're put together. – jscs Jul 17 '12 at 21:53

3 Answers3

19

The numbers were used in the m68K days to denote stack layout. That is, you could literally decode the the method signature and, for just about all types, know exactly which bytes at what offset within the stack frame you could diddle to get/set arguments.

This worked because the m68K's ABI was entirely [IIRC -- been a long long time] stack based argument/return passing. There wasn't anything shoved into registers across call boundaries.

However, as Objective-C was ported to other platforms, always-on-the-stack was no longer the calling convention. Arguments and return values are often passed in registers.

Thus, those offsets are now useless. As well, the type encoding used by the compiler is no longer complete (because it never was terribly useful) and there will be types that won't be encoded. Not too mention that encoding some C++ templatized types yields method type encoding strings that can be many Kilobytes in size (I think the record I ran into was around 30K of type information).

So, no, it isn't correct to use sizeof() to generate the numbers because they are effectively meaningless to everything. The only reason why they still exist is for binary compatibility; there are bits of esoteric code here and there that still parse the type encoding string with the expectation that there will be random numbers sprinkled here and there.

Note that there are vestiges of API in the ObjC runtime that still lead one to believe that it might be possible to encode/decode stack frames on the fly. It really isn't as the C ABI doesn't guarantee that argument registers will be preserved across call boundaries in the face of optimization. You'd have to drop to assembly and things get ugly really really fast (>shudder<).

bbum
  • 162,346
  • 23
  • 271
  • 359
  • 1
    I've been monkeying around with these things lately, and started wondering if the digits actually meant anything (especially given the use of registers). Thanks for the confirmation! – jscs Jul 17 '12 at 17:45
  • Their roots lie in SmallTalk, really. Or, specifically, in the notion that Objective-C could bring the full dynamism of SmallTalk to a compiled C based ABI. It sorta worked on the m68k, but proved to be pretty useless. Hence the move in the last decade to more and more specific typing; still allows for at-runtime dynamism, but takes full advantage of the compiler's type safety (and, with the analyzer, pattern analysis). – bbum Jul 17 '12 at 18:07
  • Thanks a lot for the wonderful reply! Does it mean that if I register a new method to a new class I can pass just `v0@0:0:0@0B0@0` or even `v@::@B@` to `class_addMethod`? – Ecir Hana Jul 17 '12 at 19:24
  • In general, you can just pass an empty string. Pretty much nothing actually looks at that string any more outside of NSInvocation or, maybe (haven't read the code in a bit), method forwarding. For a long while, implementations have focused on not mucking with stack frames at all. I pass, at the least, `@8@0:4` out of habit, but I'm old. You might find this interesting: http://www.friday.com/bbum/2009/12/18/objc_msgsend-part-1-the-road-map/ – bbum Jul 17 '12 at 19:31
  • Thanks! Would you mind looking at my other question (http://stackoverflow.com/questions/11511585/objective-c-is-a-method-variadic), as you seem to know a thing or two about such things? – Ecir Hana Jul 17 '12 at 19:36
  • 2
    I can confirm that `NSMethodSignature` expects a non-empty types string for `signatureWithObjCTypes:`, and that fiddling with an `NSInvocation` that has an incorrect method signature is quite likely to raise an exception (e.g., `getArgument:atIndex:` relies on the same info as the method signature's `numberOfArguments` for bounds checking). @EcirHana – jscs Jul 17 '12 at 19:58
  • @JoshCaswell: thanks for the info. Do you know what would happend if I were to pass e.g. `v0@0:0:0@0B0@0`? I mean, if at least number of the codes would be correct..? – Ecir Hana Jul 17 '12 at 20:39
  • 1
    @Ecir: I guess that would depend on what you're doing with it. The `NSInvocation` seems like it may be happy (in the **very** limited testing I did a little while ago) with an encoding string that has _no_ digits at all, as long as the types are correct and of the correct number. – jscs Jul 17 '12 at 20:46
  • What Josh said; anything that deals with the encoding strings will be somewhat fragile, NSInvocation and similar included. Other, more esoteric (often older), mechanisms will be more fragile. Since there are type encoding strings without digits in certain contexts, it isn't surprising that some things can deal with them. IIRC, class_addMethod() and friends don't use the digitized form. – bbum Jul 17 '12 at 21:14
  • Btw., from the docs: "[getArgument:atIndex:] method raises NSInvalidArgumentException if *index* is greater than the actual number of arguments for the selector." I guess this is what @JoshCaswell said above. – Ecir Hana Jul 17 '12 at 21:37
  • 1
    Yeah; you can't get any kind of metadata about the goop that is passed as varargs. And, in fact, the ABI between vararg argument encoding and non-vararg encoding can be different. This is part of why the ARC compiler requires you to explicitly typecast `objc_msgSend()` to a particular strongly typed declaration (this and also to fulfill the ARC contract, which is more conservative than the standard C ABI). – bbum Jul 17 '12 at 21:52
10

The full encoding string is constructed (in clang) by the method ASTContext::getObjCEncodingForMethodDecl, which you can find in clang/lib/AST/ASTContext.cpp.

The method that does the size rounding is ASTContext::getObjCEncodingTypeSize, in the same file. It forces each size to be at least the size of an int. On all of Apple's current platforms, an int is 4 bytes.

rob mayoff
  • 375,296
  • 67
  • 796
  • 848
1

The stack frame size and argument offsets are calculated by the compiler. I'm actually trying to track this down in the Clang source myself this week; it possibly has something to do with CodeGenTypes::arrangeObjCMessageSendSignature. (Looks like Rob just made my life a lot easier!)

The first number is the sum of the others, yes -- it's the total space occupied by the arguments. To get the size of the type represented by an ObjC type encoding in your code, you should use NSGetSizeAndAlignment().

Community
  • 1
  • 1
jscs
  • 63,694
  • 13
  • 151
  • 195