WebAssembly: thread-safety and C/C++ local variables

Question

I'm trying to understand the WebAssembly memory model, specially from the perspective of: what kind of risks I'm exposed to when sharing linear memory between WebAssembly instances? The basic memory model that all C/C++ => wasm tutorials gives us is as follow (the stack starts as __heap_base - 1 and grows downwards):

+-----------------------------------------------+
| ? | static data |     stack     |     heap    |
+-----------------------------------------------+
^   ^             ^               ^             ^
|   |             |               |             |
0 __global_base  __data_end     __heap_base  MAX_MEMORY

But the following fact surprised me. From https://webassembly.org/docs/security/:

Local variables with unclear static scope (e.g. are used by the address-of operator, or are of type struct and returned by value) are stored in a separate user-addressable stack in linear memory at compile time. This is an isolated memory region with fixed maximum size that is zero initialized by default.

and from https://github.com/WebAssembly/design/blob/main/Rationale.md#locals:

C/C++ makes it possible to take the address of a function's local values and pass this pointer to callees or to other threads. Since WebAssembly's local variables are outside the address space, C/C++ compilers implement address-taken variables by creating a separate stack data structure within linear memory. This stack is sometimes called the "aliased" stack, since it is used for variables which may be pointed to by pointers.

In other words, the stack defined from __heap_base - 1 to __data_end is an implementation artifact of C/C++ compiled modules. The "WASM stack" lives outside the linear memory. It just happens that, when you take the address of a local (for example), the compiler stores it in the "aliased stack" instead so there's an address to take.

Doesn't this behavior open the door to new kind of very dangerous data races in case of using shared memory?

Imagine a piece of code like this:

int calculation(int param1, int param2)
{
    if (param1 == param2 * 2)
        ++param1;
    else
        ++param2;

    return param1 / 3 + param2;
}

Here, calculation is thread-safe. However, if I replace calculation by this equivalent form:

int calculation(int param1, int param2)
{
    int* param = param1 == param2 * 2 ? &param1 : &param2;

    ++*param;

    return param1 / 3 + param2;
}

Depending on compiler's output, calculation could no longer be thread-safe in case param1 and/or param2 are stored on the aliased-stack, which lives on the linear memory, which could be shared among other instances if shared memory is enabled by the --features=atomics,bulk-memory --shared-memory flags.

So, in which exact situations can the compiler decide to store a local variable on the aliased-stack?

EDIT: I did some tests to verify, and I would like to know if I'm right on this. I stored, on the heap, the memory addresses of the first, the half and the last local variables of a function that use 16 unsigned local variables, and I print them out from javascript, and the difference between the lowest stored address to __heap_base was 32*3 bytes + padding, and not 32*16 + padding, which means that only the three variables whose memory address was taken was stored on the aliased-stack. Of course, these tests are not thread-safe because I'm storing the addresses of locals outside the function, but it illustrates the point: if, on a re-entrant function, I'm temporarily taking the address of a local for implementation convenience, and, because of its complexity, the compiler isn't sure about what I'm trying to do, it could finally decide to store the local on the stack instead of changing its implementation, turning the function thread-unsafe.

I don't know much about WASM, but I would presume the WASM stack is after all a *stack*. Each call to `calculation` would allocate its own memory for `param1` and `param2`, so separate calls, whether from different threads or recursive, should get independent memory to use for its `param1` and `param2`. — Nate Eldredge, Oct 31 '21 at 18:55
I don't know how the Web Assembly memory works but you can certainly assume that they aren't idiots and there is no way that calling same simple function from multiple threads can cause a data race. Just try to figure out better what they mean by sharing linear memory. — ALX23z, Oct 31 '21 at 18:55
I would also assume that the "aliased stack", and its stack pointer, are thread-local, i.e. every thread has its own. So that takes care of the thread safety aspect, and recursive calls within a thread still work because it's a stack. — Nate Eldredge, Oct 31 '21 at 19:03
@NateEldredge As far as I know, WebAssembly has no support for multi-threading yet so I can't launch threads from C++. Parallelism must be implemented at Javascript level: I create a shared linear memory and I launch different WebWorkers, each one with it's own wasm instance but all sharing the same imported memory. The aliased stack will be shared among all instances, and no instance can know how many other brothers it is sharing memory with. — spaghetti, Oct 31 '21 at 19:30
@ALX23z I edited my question to clarify that point a bit more. — spaghetti, Oct 31 '21 at 19:31

Nikolay Handzhiyski · Accepted Answer · 2021-11-01T15:39:52.943

3

In a multi-threaded setup, each thread will get its own stack into the shared memory. The stack pointer (the creation of it seems to be done by LLVM createSyntheticSymbols) is placed into a WebAssembly global variable. Currently these globals are used as a thread-local storage. That means that each thread has its own global variable.

At the start of the WebAssembly instance, the main thread will have its own global variable pointing to the main thread stack into the shared memory. If you start another thread, during its startup time, its global variable will point to another place into the shared memory, where the stack for this thread is placed.

The allocation of the stack seems to be done by Emscripten __pthread_create_js if the caller does not supply its own pointer. The allocation of variables into the current stack is done here with stackAlloc where:

global.get __stack_pointer

is getting the current thread stack pointer, subtracts the needed bytes (the stack grows down), aligns it to 16 bytes and then remembers the new value back into the global. That is all thread safe, because the global is only accessible from the thread itself.

About the pointers, yes, the compiler will place the variables that are pointer-accessed into an explicit stack. Currently the WebAssembly stack is not "walkable", but there is a proposal to make it so. An explicit stack is additionally used by many implementations, to gain more fine-grained control over the stack usage (variables, structures and so on).

All of this "stuff" SHOULD (RFC 2119) be transparent for the developers. Meaning, it appears to just work.

Based on your comments: the WebAssembly standard at this time deals with the data races by the use of atomic instructions. The ordering of access for them is sequentially consistent. In the case of multi-threading, clearly the memory allocator MUST be thread safe. The use of the explicit thread dedicated stack by itself does not have to be (using globals is enough, as written up), because the stack memory is only managed by the thread itself. Check the threads proposal for the atomic instructions and the implementation status. It is allowed to use atomic instructions in unshared memory as well.

Some implementations MAY lock the whole memory when they do non-atomic access as well as for the atomic access. That is at least for the reason that the specification does not forbid higher memory access guaranties. That means that even if you create a race at some memory address, you will not manage to read inconsistent/teared values. However, this is just a possibility that SHOULD NOT be relied upon.

edited Nov 01 '21 at 15:39

answered Nov 01 '21 at 07:49

Nikolay Handzhiyski

1,360
1
6
20

1

Plus one for linking to the RFC that defines "SHOULD" – spaghetti Nov 01 '21 at 11:39
Three comments about your answer. Comment 1: you seem to explain to me how `pthread_create` is implemented, which is part of the libc implementation for WASM. `pthread_create` is called only if I want to spawn a thread from within the WASM module. I'm more focused on launching WASM modules from WebWorkers, externally, since `libc` adds to much boilerplate and I'm trying to avoid it (at least, until clang gives direct support for standard library). – spaghetti Nov 01 '21 at 12:23
Comment 2: I'm not using emscripten, but clang++ (with `wasi-sdk`; I was having trouble with the sysroot using my current clang installation and `wasi-sdk` was the only way to go). Do you know by any chance the clang-equivalent to the `__stack_pointer`, and any way to specify it my own pointer per-instance? clang doesn't seem to define `__stack_pointer`, or at least not as a exportable symbol (`extern unsigned char __stack_pointer` doesn't work). I'm fine managing the memory myself (see comment 3). – spaghetti Nov 01 '21 at 12:25
Comment 3: I know that thread-safety should be transparent to the developer, but I'm also aware that WASM-threads are not fully supported yet and I'm trying to look for more "manual solutions". – spaghetti Nov 01 '21 at 12:26
Your initial question was, is it thread safe. Yes it is. I have given an example with LLVM, just to backup my claims. How it is implemented today in one compiler or another might not be the same after few years. If you want to do some specific initialization, you are better of asking another question for this particular case. See also this for LLVM/Clang: https://stackoverflow.com/questions/5708610/llvm-vs-clang-on-os-x – Nikolay Handzhiyski Nov 01 '21 at 12:57
From your answer, what I understood is that it's thread safe if I launch a thread WITHIN the WASM module using emscripten (in other words, if the C/C++ source file is the one that calls pthread_create, directly or indirectly), but I don't see the connection between your answer and this setup: two web workers, each one launching its own instance of a same WASM module (compiled from C/C++ without emscripten, without wasi-libc, and without using any standard library at all), and sharing the linear memory. – spaghetti Nov 01 '21 at 13:24
1

These are some new requirements, but the relation to the shared memory remains the same. The data races are covered by the use of atomic variables. If two threads read and write the same address not atomically, you will have a data race. Check this one for the memory model: https://github.com/WebAssembly/design/issues/1397 . If you have multiple threads using the shared memory you MUST, one way or another, ensure that the explicit stacks do not overlap. That is taken care for you by emscripten as I have wrote. – Nikolay Handzhiyski Nov 01 '21 at 14:08
1

WebAssembly by itself has no instructions at this time to create threads. That functionality is left to the host. The LLVM and Emscripten just complete each other for this task. In a browser, to create a thread, you use a WebWorker. Currently there is no way to directly use a WebAssembly file in a worker, but only JS that has to load the WebAssembly. That means that in some way, you need the glue JS code to provide you with thread support. However, all of this has little to do with the memory model per se. – Nikolay Handzhiyski Nov 01 '21 at 14:18

score 1 · Answer 2 · answered Nov 01 '21 at 09:46

The choice made by WASM isn't that unusual. Split-stack and multi-stack designs are not new, and have always been compatible with C and C++. This is a deliberate result of an under-specification of C, which has always allowed "stack" variables to live in non-addressable registers. The C stack is abstract, and there's only a limited relation to the underlying execution environment.

When C++ adopted the Java Memory Model for C++11 (which C followed), thread safety was not "automatic", but that only applies to C++ objects. "The heap" is not an object in that sense, but a concept, and it's the implementations responsibility to keep that safe. Note that the C++ standard does not require performance. A global lock to protect the heap is technically allowed.

In this case, that means that WASM should just keep the separate stacks separate (as @Nikolay) points out. It doesn't matter what region of memory these stacks occupy, as long as the various fragments of the various stacks do not overlap at any specific moment in time.

Yes that's a good point, I never thought of the "protected stack" of WASM as a set of an infinite amount of registers, which are non-addressable too as you say. — spaghetti, Nov 01 '21 at 12:32

WebAssembly: thread-safety and C/C++ local variables

2 Answers2