Basically, the question is what instruction takes less time to execute (or they take the exact same time):
add rax, rbx
; or
or rax, rbx
For example, if I want to access efficiently a virtual CPU core and OR
executes more fast, than the code I should write would be something like this (C):
struct CPUCore { /* ... */ };
sizeof(CPUCore); // 64 e.g.
// normal allocation
CPUCore* allocNormal() {
CPUCore* inst = (CPUCore*)malloc(sizeof(CPUCore));
return inst;
};
// aligning allocation
struct Result {
// free memory by 'res.free(block)'
// use object by 'res.ptr'
CPUCore* block;
CPUCore* ptr;
};
CPUCore* allocAlign() {
// allocating 2 times the bytes in order to ensure 64 byte block
// with address & 63 == 0, so instead of 'ptr + shift'
// we can use 'ptr | shift'
Result ret;
ret.block = malloc(sizeof(CPUCore) * 2);
ret.ptr = ret.block + 0x40 - (ret.block & 0x3F);
return ret;
};