I have an input uint64_t X
and number of its N
least significant bits that I want to write into the target Y
, Z
uint64_t values starting from bit index M
in the Z
. Unaffected parts of Y
and Z
should not be changed. How I can implement it efficiently in C++ for the latest intel CPUs?
It should be efficient for execution in loops. I guess that it requires to have no branching: the number of used instructions is expected to be constant and as small as possible.
M
and N
are not fixed at compile time. M can take any value from 0 to 63 (target offset in Z), N is in the range from 0 to 64 (number of bits to copy).
illustration: