81

The title of the question might be a bit strange, but the thing is that, as far as I know, there is nothing that speaks against tail call optimization at all. However, while browsing open source projects, I already came across a few functions that actively try to stop the compiler from doing a tail call optimization, for example the implementation of CFRunLoopRef which is full of such hacks. For example:

static void __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__() __attribute__((noinline));
static void __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__(CFRunLoopObserverCallBack func, CFRunLoopObserverRef observer, CFRunLoopActivity activity, void *info) {
    if (func) {
        func(observer, activity, info);
    }
    getpid(); // thwart tail-call optimization
}

I would love to know why this is seemingly so important, and are there any cases were I as a normal developer should keep this is mind too? Eg. are there common pitfalls with tail call optimization?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
JustSid
  • 25,168
  • 7
  • 79
  • 97
  • 10
    One possible pitfall might be that an application works smoothly on several platforms and then suddenly stops working when compiled with a compiler that doesn't support tail call optimization. Remember that this optimization can actually not only increase performance, but prevent runtime errors (stack overflows). – Niklas B. May 28 '12 at 21:51
  • 5
    @NiklasB. But isn't this a reason to **not** try to disable it? – JustSid May 28 '12 at 21:54
  • 4
    A system call might be a sure way of wharting TCO, but also a pretty expensive one. – Fred Foo May 28 '12 at 21:54
  • 1
    @JustSid: Not if you're interested in having a portable program that is independent of specific compiler optimizations – Niklas B. May 28 '12 at 21:55
  • Do you have other examples too? Could it be that the developer of this particular library has a personal fear of tail-call optimization? – Shahbaz May 28 '12 at 21:56
  • Indeed, on GCC, `__asm__ __volatile__ ( "" : : "memory" );` would be a much cheaper way to do it. – R.. GitHub STOP HELPING ICE May 28 '12 at 21:57
  • @Shahbaz I have seen this in other libs/projects too, but I don't remember which exactly (although they used some `__asm__ volatile` instead of a syscall. I will edit the question when I find them again, but it looks like its either a more common fear or a problem they are trying to avoid – JustSid May 28 '12 at 21:57
  • 1
    `__attribute__((noinline))` looks suspicious as well. Maybe the author depends on a very specific runtime behaviour with regard to stack setup? – Niklas B. May 28 '12 at 21:59
  • Not sure if this is at all relevant, but I thought there may be a problem in particular with tail-call optimization and os x and I came across [this](http://my.safaribooksonline.com/book/operating-systems-and-server-administration/solaris/9780137061839/tips-and-tricks/ch14lev1sec16) searching in google which apparently says it causes problems with stack tracing or something. – Shahbaz May 28 '12 at 22:01
  • 39
    This is a great teachable moment for proper commenting. +1 for partially explaining why that line is there (to prevent tail-call optimization), -100 for not explaining why tail-call optimization needed to be disabled in the first place... – Mark Sowul May 28 '12 at 22:09
  • 16
    Since the value of `getpid()` is not being used, couldn't it be removed by an informed optimizer(since `getpid` is a function that is known to have no side effects), therefore allowing the compiler to do a tail call optimization anyway? This seems a *really* fragile mechanism. – luiscubal May 28 '12 at 23:26
  • @ArjunShankar You're right, forget it. The answers (especially the accepted) assumed tail recursion, this threw me off. – Konrad Rudolph May 29 '12 at 10:21
  • @luiscubal how does the compiler know there are no side effects? the getpid() C library call may invoke an OS syscall sys_getpid, which must be assumed to have side effects. in reality, most C library implementations modify internal variables when getpid() is called. first time I call getpid(), I go to the kernel and the lib caches the return value. the next time I call getpid(), the lib returns the cached value. – Chris Jul 30 '13 at 17:19
  • @Chris Well, `x = getpid(); x = getpid();` is the same as `x = getpid();`, so it can indeed be optimized. The fact that the result is cached and the first time goes to the OS is irrelevant since there is no `count_number_of_times_getpid_was_called` function. The optimizer does not necessary have to assume that the call has side effects. If the result is always the same, it could be in some sort of "function whitelist" that is known to be safe and therefore optimizeable. – luiscubal Jul 30 '13 at 17:32
  • 2
    @luiscubal Typically, a function like getpid is linked dynamically so compilers have _no_ idea what getpid will do. Even if getpid's code was available to the compiler, as I described before, getpid _does_ have said effects _and_ invokes a syscall, thus forcing the call to happen. At most, `x = getpid(); x = getpid();` can be optimized to `getpid(); x=getpid();` – Chris Jul 30 '13 at 17:52
  • @Chris If getpid was declared with `__attribute__ ((pure))` then they could. It doesn't optimize it, so I'm guessing it doesn't have that attribute. Note that if it did have the attribute in the header, the type of linking would be irrelevant. And what prevents them from adding it to the header? If the standard ensures getpid always returns the same value then the answer is nothing -- and if you change the implementation that's your problem. – luiscubal Jul 30 '13 at 18:56
  • I wish there was an attribute or something to tell the compiler to not tail-call optimize a particular function, without having to make the program make useless calls at runtime. – user102008 Apr 29 '15 at 19:58

3 Answers3

83

My guess here is that it's to ensure that __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ is in the stack trace for debugging purposes. It has __attribute__((no inline)) which backs up this idea.

If you notice, that function just goes and bounces to another function anyway, so it's a form of trampoline which I can only think is there with such a verbose name to aid debugging. This would be especially helpful given that the function is calling a function pointer that has been registered from elsewhere and therefore that function may not have debugging symbols accessible.

Notice also the other similarly named functions which do similar things - it really looks like it's there to aid in seeing what has happened from a backtrace. Keep in mind that this is core Mac OS X code and will show up in crash reports and process sample reports too.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
mattjgalloway
  • 34,792
  • 12
  • 100
  • 110
  • Yes, that's consistent with `__attribute__((noinline))`. I think you're spot on here. – Niklas B. May 28 '12 at 22:04
  • Yes, makes sense indeed. But if you look where these functions are called from, you will see that they are always only called from one function, for example my example function is only called from `__CFRunLoopDoObservers` which definitely shows up in the stack trace... – JustSid May 28 '12 at 22:21
  • 1
    Sure, but I guess it's another marker for *exactly* where the observer callback / block / etc is getting run. – mattjgalloway May 28 '12 at 22:25
  • 2
    I think this is the best answer. +1 – R.. GitHub STOP HELPING ICE May 28 '12 at 23:43
  • @R.. I can only accept one answer though and Andrew White also named other cases where tail call optimization might not be wanted. Remember, I didn't ask why the function did it but why it might not be desired in general and gave the function as real world example. – JustSid May 29 '12 at 01:38
34

This is only a guess, but maybe to avoid an infinite loop vs bombing out with a stack overflow error.

Since the method in question doesn't put anything on the stack it would seem possible for the tail-call recursion optimization to produce code that would enter an infinite loop as opposed to the non-optimized code which would put the return address on the stack which would eventually overflow in the event of misuse.

The only other thought I have is related to preserving the calls on the stack for debugging and stacktrace printing.

Andrew White
  • 52,720
  • 19
  • 113
  • 137
  • 8
    I think the stacktrace/debugging explanation is much more likely (and I was about to post it). An infinite loop isn't really worse than crashing, since the user can force the application to quit. That would also explain the noinline. – ughoavgfhw May 28 '12 at 21:59
  • 3
    @ughoavgfhw: maybe, but when you get into threading and such, infinite loops are really hard to track down. I've always been of the mindset that misuse should trigger an exception. Since I've never had to do this, it's still just a guess. – Andrew White May 28 '12 at 22:02
  • 1
    synchronicity, sort of... I've just run into a bad bug that kept an application opening new windows. This makes me think, if the application would have crashed before trying to saturate "the heap" (my memory) and choking X, I would have not needed to switch to the terminal to abruptly kill the crazy app (since X started soon to become unresponsive). So maybe, it would be a reason to prefer the "fail fast" approach that could come with a stack overflow and no optimization...? or maybe it's just a different matter, though...! – ShinTakezou May 28 '12 at 22:03
  • 2
    @AndrewWhite Hmm I totally love infinite loops - I can't think of a single thing that's easier to debug, I mean you can just attach your debugger and get the exact position and state of the problem without any guessing. But if you want to get stacktraces from users I agree that an infinite loop is problematic, so that seems logical - an error will appear in your log, an infinite loop won't. – Voo May 28 '12 at 23:16
  • Preserving the callstack is the only reason I've used this in practice. With TCO, if int A() calls B() calls C(), and B->C() is a tail-call, then if something crashes inside C() the callstack will look like A called C directly, without going through B. That can be confusing. – Crashworks May 28 '12 at 23:57
  • 1
    This assumes that the function is recursive in the first place – but it isn’t; neither directly nor (by looking at the context where the function comes from) indirectly. I made the same mistaken assumption initially. – Konrad Rudolph May 29 '12 at 10:50
21

One potential reason is to make debugging and profiling easier (with TCO, the parent stack frame disappears, which makes stack traces harder to make sense of.)

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 2
    making profiling easier at the cost of slowing down the program is kinda weird though. It makes as much sense as diluting your oil before measuring how far your car can go :x – Matthieu M. May 29 '12 at 07:45
  • 1
    @MatthieuM.: Such a thing wouldn't make sense if the added call was performed millions of times in a loop, but if it's executed a few hundred times a second or less, it may be better to leave it in the real system and be able to examine how the real system behaves, than to take it out and risk having such removal make a subtle but important change in system behavior. – supercat Feb 24 '15 at 00:26
  • @MatthieuM. If diluting your oil is the prerequisite for any measurement at all, then it actually makes perfect sense. – Dmitry Grigoryev Nov 25 '16 at 10:44
  • @DmitryGrigoryev: No, it doesn't. No measure is annoying, but a wrong measure is from useless to dangerous (depending how much trust you put in it). Continuing with the oil analogy: if it slows you down, then you might get measures that indicate that weight is more important than aerodynamics, and thus remove weight and worsen aerodynamics to optimize for what you've measured... however with real oil, when you go faster, it turns out that aerodynamics were more important and your "improvement" is worse than doing nothing! – Matthieu M. Nov 25 '16 at 12:55
  • @MatthieuM. Are you familiar with the uncertainty principle? Any measurement is wrong to some degree, because there's no way to measure anything without interacting with the object being measured. So, even if you don't alter the oil in your example, instrumenting the car will change aerodynamics anyway. – Dmitry Grigoryev Nov 25 '16 at 13:01
  • @DmitryGrigoryev: Some uncertainty cannot be avoided, certainly, but there is a continuum between small perturbations which loosen the precision and big perturbations which make the measure completely irrelevant. Preventing TCO has big impacts on deeply recursive function calls: overhead of the call, stack consumption imply L1 cache pollution, etc... and of course preventing the optimizer from turning the recursive calls into a loop in the worst case (especially if it prevents the compiler from evaluating the computation at compile-time). – Matthieu M. Nov 25 '16 at 13:23
  • @MatthieuM. Yes, but TCO is not limited to recursive code, any function that ends with a call can be optimized like that. Sure, profiling recursive stuff with TCO disabled doesn't make sense (but then, I can tell without any profiling where the problem is ;), but the question is not about recursive functions in the first place. – Dmitry Grigoryev Nov 25 '16 at 13:27