I'm having trouble pinning down the semantics of the shared
type qualifier in D. Specifically, is casting an unshared local variable to shared without ever actually copying the contents or assigning the result of the cast to a shared local variable sufficient to guarantee visibility between threads?
(A Very Contrived) Example
import std.concurrency;
import std.random : uniform;
import std.stdio : writefln;
enum size_t N_STOPS = 10;
enum size_t N_DESTS = 10;
enum size_t N_PACKETS = 5;
class Packet {
Tid[N_STOPS] log;
size_t idx = 0;
bool visit(Tid tid) {
assert(idx < N_STOPS);
log[idx++] = tid;
return idx < N_STOPS;
}
void print(size_t idNum) {
writefln("Packet %d: visited %d threads.", idNum, idx);
for (size_t i = 0; i < idx; ++i) {
string tid;
log[i].toString(delegate (const(char)[] sink) { tid ~= sink.idup; });
writefln("\t%d: %s", i, tid);
}
}
}
shared Tid sender;
shared Tid[N_DESTS] destinations;
void awaitVisitor() {
try {
for(;;) {
Packet packet = cast() receiveOnly!(shared Packet);
bool continueJourney = packet.visit(thisTid());
Tid dest;
if (continueJourney)
dest = cast() destinations[uniform(0, N_DESTS)];
else
dest = cast() sender;
send(dest, cast(shared) packet);
}
} catch (Exception ignore) {
// program is terminating
}
}
void main() {
sender = cast(shared) thisTid();
for (size_t i = 0; i < N_DESTS; ++i)
destinations[i] = cast(shared) spawn(&awaitVisitor);
for (size_t i = 0; i < N_PACKETS; ++i) {
Tid dest = cast() destinations[uniform(0, N_DESTS)];
Packet packet = new Packet();
send(dest, cast(shared) packet);
}
for (size_t i = 0; i < N_PACKETS; ++i)
(cast() receiveOnly!(shared Packet)).print(i);
}
Questions
- Is my example defined behavior?
- Will this work as expected? That is, is the cast to
shared
sufficient to guarantee visibility of the entire contents oftoSend
in the receiving thread? - Can I substitute pointers to raw memory or structs? For example,
cast(shared) &someStruct
. Basically, does a cast fromvoid*
toshared(void*)
guarantee visibility of all previous writes that were performed through that pointer? - Does a formal specification of the D memory model exist somewhere? Because I couldn't find one.
Additional Questions
- Do I need to add additional barriers or synchronization, provided I'm using
send
andreceive
from std.concurrency? - If I were to do this manually (without using std.concurrency) would it be enough to synchronize on a shared queue when inserting and removing my (otherwise unshared) data container?
- What if I were to use only a single native CAS instruction to transfer a pointer to my data?
Further Details
I'm transferring large blocks of data between threads. I do not want to copy them. Since these transfers happen at discreet points (ie receive-work-send), I don't want any of the baggage associated with the shared
type qualifier (such as being forced to use atomic operations, any disabled optimizations, additional unnecessary memory barriers, etc).
Memory models are pedantic things, and it tends to be a very bad idea to violate them. In a number of places, I've seen it stated that unshared variables may be strictly assumed by the compiler to be accessible only from the current thread. As such, I'm trying to verify that nothing more than a cast to shared when passing the data off to a function is sufficient to guarantee visibility beyond the current thread. So far it seems to work in practice, but it feels at least a bit weird for a cast to have superpowers like that; I'd rather not find out later that I'm actually relying on undefined behavior. In C++ or Java, for example, I would need to manually specify any necessary memory barriers on each side of the transfer point, employ either a mutex or a lock free data structure, and optionally nullify the local reference in order to prevent accidental access later.
Looking around, I've found some examples that appear to roughly match what I'm describing with pointers and with structs, but I don't feel like these qualify as official documentation. From the second link:
protect the shared object with a mutex and temporarily cast away shared while the mutex is locked so that you can actually do something with the object - and then make sure that no thread-local references exist when the mutex is released
Note that in that case shared is being cast away, not added, which seems like an important detail to me.
The wording of the FAQ would appear to strongly imply that casting between shared and unshared is defined behavior, provided you never try to use unshared data from two threads at once.
Checking the Type Qualifiers Spec, we see that the programmer must verify the correctness when explicitly casting qualifiers around. Unfortunately, this doesn't actually tell us anything about when moving between shared and unshared is actually allowed.
Otherwise, a CastExpression can be used to force a conversion when an implicit version is disallowed, but this cannot be done in @safe code, and the correctness of it must be verified by the user.
From The D Programming Language:
- The order of reads and writes of shared data issued by one thread is the same as the order specified by the source code.
- The global order of reads and writes of shared data is some interleaving of reads and writes from multiple threads.
...
shared accesses must be surrounded by special machine code instructions called memory barriers, ensuring that the order of reads and writes of shared data is the same as seen by all running threads
...
Combined, the two restrictions lead to dramatic slowdown—as much as one order of magnitude.
...
The compiler optimizes code using non-shared data to the maximum, in full confidence that no other thread can ever access it, and only tiptoes around shared data. [emphasis mine]