1

I would like to implement (C) a communication producer/consumer mechanism based on shared memory. It replaces a stream socket communication between a client and a remote server. Nodes in the network are sharing a pool of memory to communicate to each others. Server can write data (produce) in a memory region and the client should read it (consume). My software actually uses a thread for reading (client side) and a thread for writing (server side). Threads resides on different machines (distributed).

What is the best and fast way to implement a mutual exclusion to access the shared memory region? (memory is external to both machines and just referred) The server should atomically produce data (write) if client is not reading; client should atomically consume data (read) if server is not writing.

It is clear I need a phthread mutex like mechanism. Threads are in this case waiting to be unlocked via local kernel interrupts. Would a phthread implementation also work on this distributed scenario (lock variable placed in shared memory - option PTHREAD_PROCESS_SHARED set)?

How can I differently implement a fast and reliable mutex which makes client thread and server thread access the shared region in turn, ensuring data consistency?

Rod
  • 13
  • 4
  • Any reason you don't just change the original implementation to use local UNIX sockets? Just curious. – Jonathon Reinhart May 03 '14 at 15:44
  • I do specifically need to transfer via shared memory replacing the socket implementation because I get better performances when moving data! – Rod May 03 '14 at 15:58
  • @Rod And you actually measured that before making that assumption? Considering the network IO overhead you have in any case, I find it hard to believe that the additional overhead would be especially noticeably. – Voo May 03 '14 at 16:00
  • there is no general reply to your question, since all of this is system dependent. If you are on a modern POSIX environment like linux, just use mutexes or rwlocks, and specify the process shared option. – Jens Gustedt May 03 '14 at 16:02
  • @Voo Yes is actually measured! I also think overhead should not be noticealy, just trying to look for an optimal implementation! – Rod May 03 '14 at 16:10
  • 2
    I'm also curious what hardware you're running on, that has shared memory via the network. – Jonathon Reinhart May 03 '14 at 16:21
  • The synchronization primitives that you can use will depend on the guarantees provided by the shared memory system. For example, does it guarantee order of operations (and how exactly that is defined in a distributed system?) Are writes of any size atomic? Are there atomic instructions available? etc. Basically, the question needs more information. – Anton May 03 '14 at 16:36
  • @Jens Gustedt Yes it is POSIX environment! Do rwlocks and mutex with PTHREAD_PROCESS_SHARED also fit well for distributed multithreading environments? – Rod May 03 '14 at 16:36
  • @antonm write operations are atomic and I can guarantee an order for operations. I just need to implement an access policy to avoid reading when writing and viceversa. – Rod May 03 '14 at 16:57
  • @Rod, what is a distributed multithreading environment? Sound like a contradiction in terms to me. – Jens Gustedt May 03 '14 at 19:00
  • @Jens Gustedt I intend a local multithreading software (running several threads for several functions). Communication will take one thread to write (on the server) and one thread to read (on the client). This makes an extension in the local thread domain which could be now influenced by external threads (a thread from a remote machine). My doubts are on the use of mutex in this case. Can a mutex work even out of the local system? PTHREAD_PROCESS_SHARED shares with other processes, but what if they are remote? – Rod May 03 '14 at 20:24
  • Short answer, don't do this. distributed shared memory (DSM) is difficult to implement. there have been several tries over the years, none of them has been widely adopted. You are only buying some illusion of a shared memory for a lot of consistency *and* performance problems. But first of all, I don't have the impression that you do yet know enough about POSIX systems to even sensibly start designing such a thing. – Jens Gustedt May 03 '14 at 21:59

1 Answers1

0

So the short answer is: you can use pthread mutex mechanism so long as pthreads knows about your particular system. Otherwise you'll need to look at the specific hardware/operating system for help.

This long answer is going to be somewhat general because the question does not provide a lot of details about the exact implementation of distributed shared memory that is being used. I will try to explain what is possible, but how to do it will be implementation-dependent.

As @Rod suggests, a producer-consumer system can be implemented with one or more mutex locks, and the question is how to implement a mutex.

A mutex can be considered an object with two states {LOCKED, UNLOCKED} and two atomic operations:

  • Lock: if state is LOCKED, block until UNLOCKED. Set state to LOCKED and return.
  • Unlock: set state to UNLOCKED and return.

Often mutexes are provided by the operating system kernel by implementing these operations on an abstract mutex object. For example, some variants of Unix implement mutexes and semaphores as operations on file descriptors. On those systems, pthreads would make use of the kernel facilities.

The advantage of this approach is that user-space programs don't have to care how it's implemented. The disadvantage is that each operations requires a call into the kernel and therefore it can be relatively slow compared to the next option:

A mutex can also be implemented as a memory location (let's say 1 byte long) that stores either the value 0 or 1 to indicate UNLOCKED and LOCKED. It can be accessed with standard memory read/write instructions. We can use the following (hypothetical) atomic operations to implement Lock and Unlock:

  1. Compare-and-set: if the memory location has the value 0, set it to the value 1, otherwise fail.
  2. Conditional-wait: block until the memory location has the value 0.
  3. Atomic write: set the memory location to the value 0.

Generally speaking, #1 and #3 are implemented using special CPU instructions and #2 requires some Kernel support. This is pretty much how How pthread_mutex_lock is implemented.

This approach provides a speed advantage because a kernel call is necessary only when the mutex is contended (someone else has the lock).

Community
  • 1
  • 1
Anton
  • 3,170
  • 20
  • 20
  • This is helpful! Many thanks! My actual solution was effectively based on a locking flag (1 byte). The use of atomic operation to implement a mutual exclusion is surely needed to avoid inconsistency of data. Your consideration about performance is really appreciated. #2 should be what fits better to my case. I will let you now soon! – Rod May 04 '14 at 15:34