I'm trying to understand the WebAssembly memory model, specially from the perspective of: what kind of risks I'm exposed to when sharing linear memory between WebAssembly instances? The basic memory model that all C/C++ => wasm tutorials gives us is as follow (the stack starts as __heap_base - 1
and grows downwards):
+-----------------------------------------------+
| ? | static data | stack | heap |
+-----------------------------------------------+
^ ^ ^ ^ ^
| | | | |
0 __global_base __data_end __heap_base MAX_MEMORY
But the following fact surprised me. From https://webassembly.org/docs/security/:
Local variables with unclear static scope (e.g. are used by the address-of operator, or are of type struct and returned by value) are stored in a separate user-addressable stack in linear memory at compile time. This is an isolated memory region with fixed maximum size that is zero initialized by default.
and from https://github.com/WebAssembly/design/blob/main/Rationale.md#locals:
C/C++ makes it possible to take the address of a function's local values and pass this pointer to callees or to other threads. Since WebAssembly's local variables are outside the address space, C/C++ compilers implement address-taken variables by creating a separate stack data structure within linear memory. This stack is sometimes called the "aliased" stack, since it is used for variables which may be pointed to by pointers.
In other words, the stack defined from __heap_base - 1
to __data_end
is an implementation artifact of C/C++ compiled modules. The "WASM stack" lives outside the linear memory. It just happens that, when you take the address of a local (for example), the compiler stores it in the "aliased stack" instead so there's an address to take.
Doesn't this behavior open the door to new kind of very dangerous data races in case of using shared memory?
Imagine a piece of code like this:
int calculation(int param1, int param2)
{
if (param1 == param2 * 2)
++param1;
else
++param2;
return param1 / 3 + param2;
}
Here, calculation
is thread-safe. However, if I replace calculation
by this equivalent form:
int calculation(int param1, int param2)
{
int* param = param1 == param2 * 2 ? ¶m1 : ¶m2;
++*param;
return param1 / 3 + param2;
}
Depending on compiler's output, calculation
could no longer be thread-safe in case param1
and/or param2
are stored on the aliased-stack, which lives on the linear memory, which could be shared among other instances if shared memory is enabled by the --features=atomics,bulk-memory --shared-memory
flags.
So, in which exact situations can the compiler decide to store a local variable on the aliased-stack?
EDIT: I did some tests to verify, and I would like to know if I'm right on this. I stored, on the heap, the memory addresses of the first, the half and the last local variables of a function that use 16 unsigned local variables, and I print them out from javascript, and the difference between the lowest stored address to __heap_base
was 32*3 bytes + padding
, and not 32*16 + padding
, which means that only the three variables whose memory address was taken was stored on the aliased-stack. Of course, these tests are not thread-safe because I'm storing the addresses of locals outside the function, but it illustrates the point: if, on a re-entrant function, I'm temporarily taking the address of a local for implementation convenience, and, because of its complexity, the compiler isn't sure about what I'm trying to do, it could finally decide to store the local on the stack instead of changing its implementation, turning the function thread-unsafe.