Transfer Data Between Threads Without Copying in D

Question

I'm having trouble pinning down the semantics of the shared type qualifier in D. Specifically, is casting an unshared local variable to shared without ever actually copying the contents or assigning the result of the cast to a shared local variable sufficient to guarantee visibility between threads?

(A Very Contrived) Example

import std.concurrency;
import std.random : uniform;
import std.stdio : writefln;

enum size_t N_STOPS = 10;
enum size_t N_DESTS = 10;
enum size_t N_PACKETS = 5;

class Packet {
    Tid[N_STOPS] log;
    size_t idx = 0;

    bool visit(Tid tid) {
        assert(idx < N_STOPS);
        log[idx++] = tid;
        return idx < N_STOPS;
    }

    void print(size_t idNum) {
        writefln("Packet %d: visited %d threads.", idNum, idx);
        for (size_t i = 0; i < idx; ++i) {
            string tid;
            log[i].toString(delegate (const(char)[] sink) { tid ~= sink.idup; });
            writefln("\t%d: %s", i, tid);
        }
    }
}

shared Tid sender;
shared Tid[N_DESTS] destinations;

void awaitVisitor() {
    try {
        for(;;) {
            Packet packet = cast() receiveOnly!(shared Packet);
            bool continueJourney = packet.visit(thisTid());
            Tid dest;
            if (continueJourney)
                dest = cast() destinations[uniform(0, N_DESTS)];
            else
                dest = cast() sender;
            send(dest, cast(shared) packet);
        }
    } catch (Exception ignore) {
        // program is terminating
    }
}

void main() {
    sender = cast(shared) thisTid();

    for (size_t i = 0; i < N_DESTS; ++i)
        destinations[i] = cast(shared) spawn(&awaitVisitor);

    for (size_t i = 0; i < N_PACKETS; ++i) {
        Tid dest = cast() destinations[uniform(0, N_DESTS)];
        Packet packet = new Packet();
        send(dest, cast(shared) packet);
    }

    for (size_t i = 0; i < N_PACKETS; ++i)
        (cast() receiveOnly!(shared Packet)).print(i);
}

Questions

Is my example defined behavior?
Will this work as expected? That is, is the cast to shared sufficient to guarantee visibility of the entire contents of toSend in the receiving thread?
Can I substitute pointers to raw memory or structs? For example, cast(shared) &someStruct. Basically, does a cast from void* to shared(void*) guarantee visibility of all previous writes that were performed through that pointer?
Does a formal specification of the D memory model exist somewhere? Because I couldn't find one.

Additional Questions

Do I need to add additional barriers or synchronization, provided I'm using send and receive from std.concurrency?
If I were to do this manually (without using std.concurrency) would it be enough to synchronize on a shared queue when inserting and removing my (otherwise unshared) data container?
What if I were to use only a single native CAS instruction to transfer a pointer to my data?

Further Details

I'm transferring large blocks of data between threads. I do not want to copy them. Since these transfers happen at discreet points (ie receive-work-send), I don't want any of the baggage associated with the shared type qualifier (such as being forced to use atomic operations, any disabled optimizations, additional unnecessary memory barriers, etc).

Memory models are pedantic things, and it tends to be a very bad idea to violate them. In a number of places, I've seen it stated that unshared variables may be strictly assumed by the compiler to be accessible only from the current thread. As such, I'm trying to verify that nothing more than a cast to shared when passing the data off to a function is sufficient to guarantee visibility beyond the current thread. So far it seems to work in practice, but it feels at least a bit weird for a cast to have superpowers like that; I'd rather not find out later that I'm actually relying on undefined behavior. In C++ or Java, for example, I would need to manually specify any necessary memory barriers on each side of the transfer point, employ either a mutex or a lock free data structure, and optionally nullify the local reference in order to prevent accidental access later.

Looking around, I've found some examples that appear to roughly match what I'm describing with pointers and with structs, but I don't feel like these qualify as official documentation. From the second link:

protect the shared object with a mutex and temporarily cast away shared while the mutex is locked so that you can actually do something with the object - and then make sure that no thread-local references exist when the mutex is released

Note that in that case shared is being cast away, not added, which seems like an important detail to me.

The wording of the FAQ would appear to strongly imply that casting between shared and unshared is defined behavior, provided you never try to use unshared data from two threads at once.

Checking the Type Qualifiers Spec, we see that the programmer must verify the correctness when explicitly casting qualifiers around. Unfortunately, this doesn't actually tell us anything about when moving between shared and unshared is actually allowed.

Otherwise, a CastExpression can be used to force a conversion when an implicit version is disallowed, but this cannot be done in @safe code, and the correctness of it must be verified by the user.

From The D Programming Language:

The order of reads and writes of shared data issued by one thread is the same as the order specified by the source code.

The global order of reads and writes of shared data is some interleaving of reads and writes from multiple threads.

...

shared accesses must be surrounded by special machine code instructions called memory barriers, ensuring that the order of reads and writes of shared data is the same as seen by all running threads

...

Combined, the two restrictions lead to dramatic slowdown—as much as one order of magnitude.

...

The compiler optimizes code using non-shared data to the maximum, in full confidence that no other thread can ever access it, and only tiptoes around shared data. [emphasis mine]

So, over simplified your question is how to pass a reference object (that is quite large) to another thread without duplication? — Lupus Ossorum, Oct 06 '18 at 22:51
@LupusOssorum Sort of. I already know I can keep the object around as shared, cast that off to do stuff, and it will work. And I'm already doing the opposite - what I describe here - and it appears to be working. My question is about the *guarantees* the language provides, ie what the language spec says the compiler *must* do (not what it happens to do at present). I'm particularly interested in my `cast(shared) void*` example, and what would or would not be *guaranteed by the spec* if I used that with a native CAS instruction to pass the pointer value around. — AnOccasionalCashew, Oct 06 '18 at 23:20
@LupusOssorum From my reading to date, I'm under the impression that no formal memory model actually exists yet (though I could well be mistaken) and that I would have to ask the community to see whether this particular topic has been ironed out behind the scenes yet. — AnOccasionalCashew, Oct 06 '18 at 23:23
@LupusOssorum For example, let's push things as far as possible. Store an unshared `ubyte*` to a large block of data and hammer on its contents from one thread. Then later (ie guaranteed *happened-before*) use an unshared `ubyte*` (that's already sitting around) from another thread to modify *the same block of data*. Obviously this is undefined behavior. What (minimal) changes are required to *guarantee* that the compiler will flush the relevant caches? If I can't answer that question, then I don't fully understand D's memory model. — AnOccasionalCashew, Oct 06 '18 at 23:35
I would never expect passing a reference to another thread to deep copy it. Memory between threads is all shared, you just need a reference to it. And you can pass a reference to another thread without ever casting it to shared the `std.concurrency` library is just not letting you, but you can always just use `core.thread`. — Lupus Ossorum, Oct 07 '18 at 03:49
Casting a varriable to and from shared does nothing, it is a static cast, you are just telling the compiler to pretend it is a different type. The only difference between a shared and unshared type is that shared types you cannot do anything with. — Lupus Ossorum, Oct 07 '18 at 03:53
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181415/discussion-between-anoccasionalcashew-and-lupus-ossorum). — AnOccasionalCashew, Oct 07 '18 at 03:54

score 0 · Answer 1 · answered Oct 04 '18 at 00:48

shared is generally rather confusing... but here is an attempt at an explanation.

shared in D is merely how the compiler treats the variable without any difference to the data structure itself in the compiled code. Except, if the variable is in the global scope shared has an implicit __gshared.

shared is a type qualifier that basically says "you cannot do anything with this", the only way to do anything is to cast the shared away. Or, call a function that takes a shared parameter (that internally will cast it away or do its low level magic).

shared int a;
a++; // Error: read-modify-write operations are not allowed for shared variables.
(cast()a)++; // Works, but be sure that another thread will not be accessing `a`!

What is the point of shared? Just that, you cannot do anything with it, except in places that you do it explicitly, where thread memory access can be ensured.

So for your example

/* Thread 1 */
Data toSend = ...;
... // do stuff
send(tid_a, cast(shared) toSend);
// Thread 1 can still do anything with `toSend` as `toSend` is still not `shared`.  
// Casting a variable does not change the type of the variable, as D is a statically typed language, you cannot ever change the type of a variable.

/* Thread 1 */
shared Data toSend = ...;
... // do stuff by using `cast()toSend` every time
{
    Data toSendAccess = cast()toSend;
    ... // do stuff by using toSendAccess
}
send(tid_a, cast(shared) toSend);

Or

/* Thread 1 */
{
    Data toSend = ...;
    ... // do stuff
    send(tid_a, cast(shared) toSend);
} // now that you cannot guarantee thread safety, get rid of your reference.

I really appreciate your answer, but I'm actually trying to clarify the details of that "low level magic" that you refer to - specifically [guarantees of memory visibility between threads](https://stackoverflow.com/a/11173020). Memory models have a tendency to be... [rather nontrivial](https://en.cppreference.com/w/cpp/atomic/memory_order). I substantially trimmed my question down before posting it, removing most of the "extraneous details" which actually might have helped clarify it. Hopefully it's not too long now that I've added most of them back in. — AnOccasionalCashew, Oct 04 '18 at 04:28

Transfer Data Between Threads Without Copying in D

(A Very Contrived) Example

Questions

Additional Questions

Further Details

1 Answers1