I have a function that modifies a shared resource in my multi-threaded program. This function is the only place where the threads touch a shared resource, and it's only for a small fraction of the overall work of each thread.
static int64_t
AddToSharedResource(volatile int64_t* value, int64_t to_add)
{
int64_t result = *value;
*value += to_add;
return result;
}
I wanted to make my application thread-safe, so I added a simple mutex lock between the instructions.
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
static int64_t
AddToSharedResource(volatile int64_t* value, int64_t to_add)
{
pthread_mutex_lock(&lock);
int64_t result = *value;
*value += to_add;
pthread_mutex_unlock(&lock);
return result;
}
Doing so renders my program to be more than 10x slower, making it even slower than the single-threaded version!
After reading up a bit more, it seems to be because of macOS implementation, which uses "fair" mutexes instead of using spinlocks, and that there are certain trade-offs between the implementations but this case is one of the cases which perform badly. However, the reason I've written the code this way is that I've already written the program in Win32 (where the lock caused barely any performance penalty), and I'm planning to port the function to Linux as well.
Is there a way to make this function thread-safe in macOS without creating a huge bottleneck, or do I need to redesign the platform layer?