Atomic op solution. This is a very high speed way to get memory for your arguments.
If the arguments are always the same size, pre-allocate a bunch of them. Add a pNext to your struct to link them together. Create a _pRecycle global to hold all the available ones as a linked list using pNext to link them. When you need an argument, use atomic ops to CAS one off the head of the trash list. When you are done use atomic ops to put the arg back at the head of the trash list.
CAS refers to something like __sync_bool_compare_and_swap which returns 1 on success.
to grab argument memory:
while (1) { // concurrency loop
pArg = _pRecycle; // _pRecycle is the global ptr to the head of the available arguments
// POINT A
if (CAS(&_pRecycle, pArg->pNext, pArg)) // change pRecycle to next item if pRecycle hasn't changed.
break; // success
// POINT B
}
// you can now use pArg to pass arguments
to recycle argument memory when done:
while (1) { // concurrency loop
pArg->pNext = _pRecycle;
if (CAS(&_pRecycle, pArg, pArg->pNext)) // change _pRecycle to pArg if _pRecycle hasn't changed.
break; // success
}
// you have given the mem back
There is a race condition if something uses and recycles pArg while another thread is swapped out between point A and B. If your work takes a long time to process this wont be a problem. Otherwise you need to version the head of the list... To do that you need to be able to atomically change two things at once... Unions combined with 64 bit CAS to the rescue!
typedef union _RecycleList {
struct {
int iversion;
TArg *pHead;
}
unsigned long n64; // this value is iVersion and pHead at the same time!
} TRecycleList;
TRecycleList _Recycle;
to get mem:
while (1) // concurrency loop
{
TRecycleList Old.n64 = _Recycle.n64;
TRecycleList New.n64 = Old.n64;
New.iVersion++;
pArg = New.pHead;
New.pHead = New.pHead->pNext;
if (CAS(&_Recycle.n64, New.n64, Old.n64)) // if version isnt changed we get mem
break; // success
}
to put mem back:
while (1) // concurrency loop
{
TRecycleList Old.n64 = _Recycle.n64;
TRecycleList New.n64 = Old.n64;
New.iVersion++;
pArg->pNext = New.pHead;
New.pHead = pArg;
if (CAS(&_Recycle.n64, New.n64, Old.n64)) // if version isnt changed we release mem
break; // success
}
Since 99.9999999% of the time no two threads will be executing the code to grab memory at the same time, you get great performance. Our tests have shown CAS to be as little as 2x as slow as just setting _pRecycle = pRecycle->pNext. 64 bit and 128 bit CAS are just as fast as 32 bit. Basicly it screams. Every once in awhile the concurrency loop will execute twice when two threads actually race. One always will win, so the race resolves very fast.