Supose linux-32: the aligment rules say, for example, that doubles (8 Bytes) must be aligned to 4 Bytes. This means that, if we assume 64 Bytes cache blocks (a typical value for modern processors) we can have a double aligned in the 60th position, which mean that this double will be in 2 different cache blocks. It could even happen that both parts of the double were in 2 different cache blocks located in 2 different 4KB pages.
After this brief introduction to put the question in context, I have a couple of doubts:
1- For an assembler programming where we seek maximum performance, it is recommended to prevent these things from happenning by putting alignment directives, right? Or, for any reason that I unknow, making the alignment to make the double in only 1 block doesn't imply any performance change?
2- How will be the store instruction decoded in the in the mentioned case? (supose modern intel microarchitecture). I mean, I know that a normal store x86 instruction is decoded in a micro-fused pair of str-addr and str-data, but in this case where 2 different cache blocks (and maybe even 2 different 4KB pages) are involved, this will be decoded in 2 micro-fused pair of str-addr and str-data (one for the first 4 bytes of the double and another for the last 4 bytes)? Or it will be decoded to a single micro-fused pair but having to do both the str-addr and the str-data twice the work until finally being able to exit the execution port?