2

Does the C++ standard provide a guarantee about the non-overlapping nature of thread stacks (as in started by an std::thread)? In particular is there a guarantee that threads will have have their own, exclusive, allocated range in the process's address space for the thread stack? Where is this described in the standard?

For example

std::uintptr_t foo() {
    auto integer = int{0};
    return std::bit_cast<std::uintptr_t>(&integer); 
    ... 
}

void bar(std::uint64_t id, std::atomic<std::uint64_t>& atomic) {
    while (atomic.load() != id) {}
    cout << foo() << endl;
    atomic.fetch_add(1);
}

int main() {
    auto atomic = std::atomic<std::uint64_t>{0};
    auto one = std::thread{[&]() { bar(0, atomic); }};
    auto two = std::thread{[&]() { bar(1, atomic); }};

    one.join();
    two.join();
}

Can this ever print the same value twice? It feels like the standard should be providing this guarantee somewhere. But not sure..

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
Curious
  • 20,870
  • 8
  • 61
  • 146
  • 2
    That's an interesting question. I always just used common sense to figure that stacks never overlap. Imagine if they _could_ overlap -- how on earth could you expect a program's behavior to be well-defined? – paddy Feb 18 '19 at 01:35
  • @paddy I agree, but curious what you mean and if you have some situation in mind where this can cause the program to explode. Do you have an example? These things start getting strange once you introduce coroutines into the mix.. Where suddenly, if foo() is a coroutine, this becomes possible since the coroutine frame is heap allocated – Curious Feb 18 '19 at 01:43
  • 1
    Heap and stack are completely different. Even if `foo` is a lambda or whatever, actually _calling_ `foo` from a thread will use that thread's stack. The purpose of a stack is to provide guaranteed non-overlapping storage in which a process stores temporary data required for execution, including return addresses from function calls. If two threads could have their stack pointers collide or cross, you're in for a very rough time. – paddy Feb 18 '19 at 01:54
  • @paddy The thing that worries me is - code that starts relying on this behavior is already in for a rough time if the code is put in, say a coroutine, or a fiber. Both of which are coming to the standard soon – Curious Feb 18 '19 at 01:56
  • I skim-read the draft standard N4775 section 11.4.4 on coroutines, and the document appears to describe the _state_ of a coroutine in a very similar manner to that of a stack. – paddy Feb 18 '19 at 02:06
  • @paddy The problem comes up when you call into a coroutine, that returns a value and then you rely on say the address of that value like in the above example. Something like this https://wandbox.org/permlink/w8ki9SbWzK4a0N1l – Curious Feb 18 '19 at 02:14
  • 1
    as an aside with pthreads I used to have to set the stack size on entry. see https://unix.stackexchange.com/questions/127602/default-stack-size-for-pthreads for example. – london-deveoper Feb 18 '19 at 02:15
  • I think part of what you're asking is, "Will coroutines be threadsafe." I don't know the future, but I can tell you that any time one memory location is being updated by multiple threads (such as an activation record on the heap), you *need* a scheme for synchronization or else it's undefined behavior. And considering that adding coroutines to C++ is in part inspired by parallelism, I'd wager that they've thought about it and are coming up with a solution. You might even be part of that solution if you knew the right people... – Humphrey Winnebago Feb 18 '19 at 03:25
  • What I don't understand about your line of questioning is that your concern arises entirely around returning the address of a stack variable. That is inherently undefined behavior, as I understand it. Doesn't matter whether you're calling a traditional function, lambda or coroutine. What the standard _does_ seem to guarantee however, is that the coroutine's stack will remain intact until `co_return` is called or execution falls off the end of the coroutine. – paddy Feb 18 '19 at 04:36
  • @paddy I don't think I understand, what do you mean by taking the address of a stack variable is undefined behavior? I have not read anything about this before. – Curious Feb 18 '19 at 05:06
  • I mean using that address for any reason outside the scope in which that pointer was originally valid. The pointer has no meaning after whatever it referenced goes out of scope. So here, you created an integer on the stack inside `foo`, and your whole argument appears to center around the value of the pointer to that integer _after_ the integer goes out of scope. Perhaps I am missing the point here. – paddy Feb 19 '19 at 00:17
  • @paddy Printing the address sounds like very well defined behavior to me right? I understand your point otherwise - the pointer itself is meaningless after the integer has gone out of scope. – Curious Feb 19 '19 at 00:18
  • 1
    Sure, if by well-defined you mean "a value will be printed". But _what_ value will be printed is not defined at all. I think that if `foo` is a coroutine that returns its value with `co_return`, there is nothing stopping two non-overlapping calls to it in multiple threads from returning the same value. In fact, even without coroutines, your example does not mandate that both threads exist concurrently. It's conceivable that one thread could complete before the other is created, and thus the second thread could inherit the same stack address range as the first. – paddy Feb 19 '19 at 00:33

2 Answers2

1

The C++ standard does not even require that function calls are implemented using a stack (or that threads have stack in this sense).

The current C++ draft says this about overlapping objects:

Two objects with overlapping lifetimes that are not bit-fields may have the same address if one is nested within the other, or if at least one is a subobject of zero size and they are of different types; otherwise, they have distinct addresses and occupy disjoint bytes of storage.

And in the (non-normative) footnote:

Under the “as-if” rule an implementation is allowed to store two objects at the same machine address or not store an object at all if the program cannot observe the difference ([intro.execution]).

In your example, I do not think the threads synchronize properly, as probably intended, so the lifetimes of the integer objects do not necessarily overlap, so both objects can be put at the same address.

If the code were fixed to synchronize properly and foo were manually inlined into bar, in such a way that the integer object still exists when its address is printed, then there would have to be two objects allocated at different addresses because the difference is observable.

However, none of this tells you whether stackful coroutines can be implemented in C++ without compiler help. Real-world compilers make assumptions about the execution environment that are not reflected in the C++ standard and are only implied by the ABI standards. Particularly relevant to stack-switching coroutines is the fact that the address of the thread descriptor and thread-local variables does not change while executing a function (because they can be expensive to compute and the compiler emits code to cache them in registers or on the stack).

This is what can happen:

  1. Coroutine runs on thread A and accesses errno.

  2. Coroutine is suspended from thread A.

  3. Coroutine resumes on thread B.

  4. Coroutine accesses errno again.

At this point, thread B will access the errno value of thread A, which might well be doing something completely different at this point with it.

This problem is avoid if a coroutine is only ever be resumed on the same thread on which it was suspended, which is very restrictive and probably not what most coroutine library authors have in mind. The worst part is that resuming on the wrong thread is likely appear to work, most of the time, because some widely-used thread-local variables (such as errno) which are not quite thread-local do not immediately result in obviously buggy programs.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
  • I was talking more about stackless coroutines here (i.e. the language feature actively in development, pending merge into C++20 in Kona next week) I understand your point above though. Not sure what to make of it - are you saying that stackful coroutine (fibers) authors should take thread-locality into account? – Curious Feb 18 '19 at 08:39
  • *If the code were fixed to synchronize properly and foo were manually inlined into bar* - Hmmm, why would foo need to be inlined to forbid using the same address for both? If they are synchronized properly, we know that the two foo() calls happen at the same time, and therefore the values of the address of `integer` cannot possibly be the same? – Curious Feb 18 '19 at 08:45
0

For all the Standard cares, implementations call new __StackFrameFoo when foo() needs a stack frame. Where those end, who knows.

The chief rule is that different objects have different addresses, and that includes object which "live on the stack". But the rule only applies to two objects which exist at the same time, and then only as far as the comparison is done with proper thread synchronization. And of course, comparing addresses does hinder the optimizer, which might need to assign an address for an object that could otherwise be optimized out.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • *For all the Standard cares, implementations call new __StackFrameFoo when foo() needs a stack frame. Where those end, who knows* - What do you mean by this? A stack pointer shift? – Curious Feb 18 '19 at 01:57
  • @Curious: In such an implementation, stack frames would form a linked list, with the list elements scattered through memory. – MSalters Feb 18 '19 at 07:51