In program languages that support multi-thread, we are always taught to use atomic operations or locks or channels to synchronize multiple threads when they read/write a variable at the same time.
My question is, if the variable's byte length is not greater than the machine word length, can the CPU execute it atomically? And if so, can we simply write concurrent access code in high level language?