Here is a relevant questions to me:
Does the CPU fetch the word you are dealing with in a single operation?
Some processor might allow memory access to a variable that happens to be not aligned in memory by doing two fetches one after the other - non atomically of course.
In that case, problems would arise if another thread interjects writing on that area of memory while the first thread had fetched already the first part of the word and then fetches the second part when the other thread has already modified the word.
thread 1 fetches first part of a XXXX
thread 1 fetches second part of a YYYY
thread 2 fetches first part of a XXXX
thread 1 increments double represented as XXXXYYYY that becomes ZZZZWWWW by adding b
thread 1 writes back in memory ZZZZ
thread 1 writes back in memory WWWW
thread 2 fetches second part of a that is now WWWW
thread 2 increments double represented as XXXXWWWW that becomes VVVVPPPP by adding b
thread 2 writes back in memory VVVV
thread 2 writes back in memory PPPP
For keeping it compact I used one character to represent 8 bits.
Now XXXXWWWW
and VVVVPPPP
are going to be representation of total different floating point values than the one you would have expected. That is because you ended up mixing two parts of two different binary representation (IEEE-754) of double variables.
Said that, I know that in certain ARM based architectures data access are not allowed (that would cause a trap to be generated), but I suspect that Intel processors do allow that instead.
Therefore, if your variable a
is aligned, your result can be any of
a+1.23, a+2.34, a+1.23+2.34
if your variable might be mis-aligned (i.e. has got an address that is not a multiple of 8) your result can be any of
a+1.23, a+2.34, a+1.23+2.34 or a rubbish value
As a further note, please bear in mind that even if your environment alignof(double) == 8
that is not necessarily enough to conclude you are not going to have misalignment issues. All depends from where your particular variable comes from. Consider the following (or run it here):
#pragma push()
#pragma pack(1)
struct Packet
{
unsigned char val1;
unsigned char val2;
double val3;
unsigned char val4;
unsigned char val5;
};
#pragma pop()
int main()
{
static_assert(alignof(double) == 8);
double d;
add(d,1.23); // your a parameter is aligned
Packet p;
add(p.val3,1.23); // your a parameter is now NOT aligned
return 0;
}
Therefore asserting alignof()
doesn't necessarily guarantee your variable is aligned. If your variable is not involved in any packing then you should be OK.
Please allow me just a disclaimer for whoever else is reading this answer: using std::atomic<double>
in these situations is the best compromise in term of implementation effort and performance to achieve thread safety. There are CPUs architectures that have special efficient instructions for dealing with atomic variables without injecting heavy fences. That might end up satisfying your performance requirements already.