There are two reasonable approaches.
One is yours: Grab the low n bits of y
, nuke the middle n
bits of x, and "or" them into place.
The other is to build the answer from three parts: Low bits "or" middle bits "or" high bits.
I think I actually like your version better, because I bet n
and p
are more likely to be compile-time constants than x
and y
. So your answer becomes two masking operations with constants and one "or"; I doubt you will do better.
I might modify it slightly to make it easier to read:
mask = (~0 << p | ~(~0 << (p-n+1)))
result = (mask & a) | (~mask & (y << (p-n+1)))
...but this is the same speed (indeed, code) as yours when mask
is a constant, and quite possibly slower when mask
is a variable.
Finally, make sure you have a good reason to worry about this in the first place. Clean code is good, but for something this short, put it in a well-documented function and it does not matter that much. Fast code is good, but do not attempt to micro-optimize something like this until your profiler tells you do. (Modern CPUs do this stuff very fast; it is unlikely your application's performance is bounded by this sort of function. At the very least it is "innocent until proven guilty".)