x86 has a strongly ordered memory model, but does still allow StoreLoad reordering.
Jeff Preshing's blog post: Memory Reordering Caught in the Act, uses exactly that pair of store-then-load sequences as a test case to prove that reordering really can be observed on real hardware. He has source code and everything.
Note that each thread has its own architectural state (including all the registers). So thread1's EAX is different from thread2's EAX. Using EBX in thread2 only makes it easier to talk about, not any different from a what-can-happen POV.
Anyway, both registers can end up with 0. This rarely happens, but it can, because each thread's store can be delayed (in a store buffer or whatever) until after the other thread's load has chosen a value. Having this be legal lets the CPU aggressively use prefetched data to satisfy loads, and to buffer stores so they may not become globally visible right away when they retire. ("retire" means the architectural state (including EIP) of the thread running the instruction has moved on to the next instruction, and the effects are committed.)
The other possibilities, once the dust settles, always include both globals being 1
. All 4 possible values of zero and one in each thread's register are possible, including both 1
. It's possible for them to see each other's stores. I'm not sure how likely this is; it might require one thread being interrupted after its store but before its load. If both threads are running on the same physical core (hyperthreading), this possibility is much more likely.
Even if the storage for x
and y
is unaligned and crosses a cache line, 0
and 1
are the only possible values. (C compiler output, and JVMs, will align variables to their natural alignment, making this a non-issue, but you can do anything you want in asm so I thought I'd mention it.) This happens because the two values differ only in the least significant byte.
If you were storing a 32bit -1
to 4 bytes that span two cache lines, the other thread could load a value of 0x00ffffff
or 0xff000000
, 0x0000ffff
or 0xffff0000
, etc. (depending on where the cache-line boundary was), as well as the usual 0
or 0xffffffff
(aka -1
).
re: Java. I haven't read up on the Java memory model. Other answers are saying it even allows compile-time reordering (like c++11's std::atomic rules). Even if not, without a full memory barrier, StoreLoad reordering can happen. So all four results are possible.
This is true even if your JVM is running on an x86 CPU (rather than weakly-ordered hardware like ARM).
This answer to another question may shed some light on why LFENCE/SFENCE exist on x86, even though they are no-ops in most cases. (i.e. when not using movnt
or weakly-ordered memory regions (like USWC video memory)).
Or, just read Jeff Preshing's other blog posts to learn more about memory ordering. I found it really helpful myself.