In intel's processor manual: link in section 8.2.3.4 it is stated that loads may be reordered with earlier stores to different locations, but not with earlier stores to the same location.
So I understand that the following two operations can be reordered:
x = 1;
y = z;
And that the following two operations can not be reordered:
x = 1;
y = x;
But what happens when the store and the load are for different locations, but the load encompasses the store completely, e.g:
typedef union {
uint64_t shared_var;
uint32_t individual_var[2];
} my_union_t;
my_union_t var;
var.shared_var = 0;
var.individual_var[1] = 1;
int y = var.shared_var;
So can 'y' in this case be 0?
EDIT (@Hans Passant) To further explain the situation I'm trying to see if I can use this technique to devise a sort of quasi-synchronisation between threads without using locked instructions.
So a more specific question is, given a global variable:
my_union_t var;
var.shared_var = 0;
And two threads executing the following code:
Thread 1:
var.individual_var[0] = 1;
int y = __builtin_popcountl(var.shared_var);
Thread 2:
var.individual_var[1] = 1;
int y = __builtin_popcountl(var.shared_var);
Can 'y' be 1 for both threads?
Note: __builtin_popcountl is the builtin gcc intrinsic for counting number of bits set in a variable.