16

In a previous question (Get object call hierarchy), I got this interesting answer:

The call stack is not there to tell you where you came from. It is to tell you where you are going next.

As far as I know, when arriving at a function call, a program generally does the following:

  1. In calling code:

    • store return address (on the call stack)
    • save registers' states (on the call stack)
    • write parameters that will be passed to function (on the call stack or in registers)
    • jump to target function

  2. In called target code:

    • Retrieve stored variables (if needed)

  3. Return process: Undo what we did when we called the function, i.e. unroll/pop the call stack:

    • remove local variables from the call stack
    • remove function variables from the call stack
    • restore registers state (the one we stored before)
    • jump to return address (the one we stored before)

Question:

How can this be viewed as something that "tells you where you are going next" rather than "tell you where you came from"?

Is there something in C#'s JIT or C#'s runtime environment that makes that call stack work differently?

Thanks for any pointers to documentation about this description of a call stack — there's plenty of documentation about how a traditional call stack works.

Community
  • 1
  • 1
Yochai Timmer
  • 48,127
  • 24
  • 147
  • 185
  • Both are lies. In the presence of tail calls you have no idea where you came from or going. – leppie Jul 06 '11 at 11:16
  • I think you are overloading callstack. Is the callstack you look at in the debugger window "the" callstack, a metaphor for the callstack or just a useful debugging aid? – Jodrell Jul 06 '11 at 11:23
  • 1
    Hmm, Eric tends to answer questions wearing his language implementor glasses. In any practical scenario where you actually *look* at a call stack (the debugger's Call Stack window, an exception's StackTrace property), you are most definitely interested in the "How did I get here" question. Especially in the case of an exception, that stack trace does *not* tell you where you go next. – Hans Passant Jul 06 '11 at 12:54
  • 1
    @Hans: I am not sure I agree with you. I agree that of course the *reason* why you look in the stack trace window is to see "how did I get here?" But the fact that it is possible to deduce that information from the continuation information on the stack is a happy accident that makes it easy to implement the feature. There is no *requirement* that the stack tell you where you came from. Regarding your point about exceptions: same thing! The stack is a data structure that tells you (1) where do I go next if there was no exception, and (2) where do I go if there was an exception? – Eric Lippert Jul 06 '11 at 22:16
  • 1
    The latter information is of course not included in the *stack trace*, but it most certainly is on *the stack*. We could also build a debugger tool that examined the stack and told you where execution would branch in the event of an exception, but not many customers really care about that information so no one implemented the feature. – Eric Lippert Jul 06 '11 at 22:17

5 Answers5

34

You've explained it yourself. The "return address" by definition tells you where you are going next.

There is no requirement whatsoever that the return address that is put on the stack is an address inside the method that called the method you're in now. It typically is, which sure makes it easier to debug. But there is not a requirement that the return address be an address inside the caller. The optimizer is permitted to -- and sometimes does -- muck with the return address if doing so makes the program faster (or smaller, or whatever it is optimizing for) without changing its meaning.

The purpose of the stack is to make sure that when this subroutine finishes, it's continuation -- what happens next -- is correct. The purpose of the stack is not to tell you where you came from. That it usually does so is a happy accident.

Moreover: the stack is just an implementation detail of the concepts of continuation and activation. There is no requirement that both concepts be implemented by the same stack; there could be two stacks, one for activations (local variables) and one for continuation (return addresses). Such architectures are obviously much more resistant to stack smashing attacks by malware because the return address is nowhere near the data.

More interestingly, there is no requirement that there be any stack at all! We use call stacks to implement continuation because they are convenient for the kind of programming we typically do: subroutine-based synchronous calls. We could choose to implement C# as a "Continuation Passing Style" language, where the continuation is actually reified as an object on the heap, not as a bunch of bytes pushed on a million byte system stack. That object is then passed around from method to method, none of which use any stack. (Activations are then reified by breaking each method up into possibly many delegates, each of which is associated with an activation object.)

In continuation passing style there simply is no stack, and no way at all to tell where you came from; the continuation object does not have that information. It only knows where you are going next.

This might seem to be a highfalutin theoretical mumbo jumbo, but we essentially are making C# and VB into continuation passing style languages in the next version; the coming "async" feature is just continuation passing style in a thin disguise. In the next version, if you use the async feature you will essentially be giving up stack-based programming; there will be no way to look at the call stack and know how you got here, because the stack will frequently be empty.

Continuations reified as something other than a call stack is a hard idea for a lot of people to get their minds around; it certainly was for me. But once you get it, it just clicks and makes perfect sense. For a gentle introduction, here are a number of articles I've written on the subject:

An introduction to CPS, with examples in JScript:

http://blogs.msdn.com/b/ericlippert/archive/2005/08/08/recursion-part-four-continuation-passing-style.aspx

http://blogs.msdn.com/b/ericlippert/archive/2005/08/11/recursion-part-five-more-on-cps.aspx

http://blogs.msdn.com/b/ericlippert/archive/2005/08/15/recursion-part-six-making-cps-work.aspx

Here are a dozen articles that start by doing a deeper dive into CPS, and then explain how this all works with the coming "async" feature. Start from the bottom:

http://blogs.msdn.com/b/ericlippert/archive/tags/async/

Languages that support continuation passing style often have a magic control flow primitive called "call with current continuation", or "call/cc" for short. In this stackoverflow question, I explain the trivial difference between "await" and "call/cc":

How could the new async feature in c# 5.0 be implemented with call/cc?

To get your hands on the official "documentation" (a bunch of white papers), and a preview release of C# and VB's new "async await" feature, plus a forum for support Q&A, go to:

http://msdn.com/vstudio/async

Community
  • 1
  • 1
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Thanks for the reply, very interesting. Is there any official documentation yet for the continuation passing style languages implementation in the upcoming version (I guess .Net 5 ?) – Yochai Timmer Jul 06 '11 at 14:07
  • @Yochai: I've added links to all kinds of supporting information. – Eric Lippert Jul 06 '11 at 14:26
7

Consider the following code:

void Main()
{
    // do something
    A();
    // do something else
}

void A()
{
    // do some processing
    B();
}

void B()
{
}

Here, the last thing the function A is doing is calling B. A immediately returns after that. A clever optimizer might optimize out the call to B, and replace it with just a jump to B's start address. (Not sure whether current C# compilers do such optimizations, but almost all C++ compilers do). Why would this work? Because there's an address of the A's caller in the stack, so when B finishes, it would return not to A, but directly to A's caller.

So, you can see that the stack does not necessary contain the information about where did the execution come from, but rather where it should go to.

Without optimization, inside B the call stack is (I omit the local variables and other stuff for clarity):

----------------------------------------
|address of the code calling A         |
----------------------------------------
|address of the return instruction in A|
----------------------------------------

So the return from B returns to A and immediately quits `A.

With the optimization, the call stack is just

----------------------------------------
|address of the code calling A         |
----------------------------------------

So B returns directly to Main.

In his answer, Eric mentions another (more complicated) cases where the stack information doesn't contain the real caller.

Vlad
  • 35,022
  • 6
  • 77
  • 199
  • But wouldn't C#'s StackTrace object show the actual non optimized call hierarchy ? – Yochai Timmer Jul 06 '11 at 11:43
  • 3
    Not really: the stack trace shows the _actual_ stack. How can it know the "intended" stack? This information just doesn't exist. – Vlad Jul 06 '11 at 11:44
3

What Eric is saying in his post is that the execution pointer does not need to know where it has come from, only where it has to go when the current method ends. These two things superficially would seem to be the same thing, but if the case of (for instance) tail recursion where we came from and where we are going next can diverge.

spender
  • 117,338
  • 33
  • 229
  • 351
1

There is more to this than you think.

In C it is entirely possible to have a program rewrite the call stack. Indeed, that technique is the very basis of a style of exploit known as return oriented programming.

I've also written code in one language which gave you direct control over the callstack. You could pop off the function that called yours, and push some other one in its place. You could duplicate the item on the top of the call stack, so the rest of the code in the calling function would get executed twice, and a bunch of other interesting things. In fact direct manipulation of the call stack was the primary control structure provided by this language. (Challenge: can anybody Identify the language from this description?)

It did clearly show that the call stack indicates where you are going, not where you have been.

Kevin Cathcart
  • 9,838
  • 2
  • 36
  • 32
0

I think he's trying to say that it tells the Called method where to go next.

  • Method A calls Method B.
  • Method B completes, where does it go next?

It Pops the callee methods address off the top of the Stack and then goes to there.

So Method B knows where to go after it completes. Method B, doesn't really care where it came from.

DaveShaw
  • 52,123
  • 16
  • 112
  • 141