I need to do some bit manipulations: get k bits from i-th in a 64-bit register, where 9 ≤ k ≤ 12. k may change depending on a value we've read, i is counted from MSB and increments by k after each read. (i and k are only needed to describe the task, neither of them has to be in the final solution)
The simplest solution I came up with looks like that in C:
uint32_t result = ((n << i) >> (64 - k));
I wonder if there is a better way (especially in terms of performance) to do the same thing. E.g. I found BEXTR assembly instruction that would do exactly what I need.
Would it work faster? Or maybe there is a better way to do all of that?
I'm not sure if benching it on my single processor would be sufficient, thus the question is more about how much the instruction is optimized in microcode
Upd. 1 Nate Eldredge gave a wonderful reference: uops.info
Upd. 2 Jester and Peter Cordes gave a nice idea of using shrx
+ bzhi