Is there an x86(_64) instruction which gives index of the highest (or lowest) set (1) bit?

Question

As said. e.g. for the 8-bit(just for example, no byte order considered) integer 00100100 (base 2), is there an instruction gives 5?

possible duplicate of [Bit twiddling: which bit is set?](http://stackoverflow.com/questions/3465098/bit-twiddling-which-bit-is-set) — Will A, Sep 26 '10 at 04:24
For using it from C or C++, see [What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?](https://stackoverflow.com/q/671815). (And my answer on [Find most significant bit (left-most) that is set in a bit array](https://stackoverflow.com/a/54760134) for details) — Peter Cordes, May 22 '21 at 23:19

score 5 · Answer 1 · edited Mar 16 '15 at 05:20

5

Technically, no. There's BSR to find the most significant bit that's set, and BSF to find the least significant bit that's set -- but the smallest item either will work on is a 16-bit word.

edited Mar 16 '15 at 05:20

Parthian Shot

1,390
1
13
24

answered Sep 26 '10 at 04:28

Jerry Coffin

476,176
80
629
1,111

For a byte, just make sure it's zero-extended into a wider register (like you'd normally do anyway). Since BSR and BSF both find bit-indexes, the extra leading zeros don't matter. (Or with LZCNT which is faster on AMD, use 31-lzcnt e.g. `lzcnt eax, eax` / `xor eax, 31` using the same trick GCC uses to implement `__builtin_clz` in terms of `bsr`) – Peter Cordes May 22 '21 at 23:21

score 3 · Answer 2 · answered Sep 26 '10 at 04:25

3

Yep, BSR. However, do note that the bithack page claims that at least on one CPU a sequence (unrolled loop) of bit shift operations is faster than a single BSR.

answered Sep 26 '10 at 04:25

Alex Martelli

854,459
170
1,222
1,395

1

I'm confused by the comment on the bithacks page. BSR by itself doesn't compute the next highest power-of-two. You have to use `64 - BSR(v - 1)`, which is obviously not "a single BSR assembly language instruction". Perhaps I'm just misreading it. – Marcelo Cantos May 21 '13 at 21:46
Not faster. From the bithack page: "On a Athlon™ XP 2100+ I've found the above shift-left and then OR code is as fast as using a single BSR assembly language instruction" – liorda Jun 02 '16 at 09:14
Some old AMD CPUs have slow microcoded `bsr` / `bsf` (which are thoroughly obsolete 10 years after this answer was posted). They're down to about 7 uops on modern AMD, with 3 cycle latency (https://agner.org/optimize/), vs. 1 uop / 3c latency on Intel. (Except for the "false" dependency on the output, from its funky behaviour of leaving the destination unchanged on src=0, documented by AMD but implemented by both AMD and Intel.) (`lzcnt` / `tzcnt` are single-uop on all CPUs that support them.) – Peter Cordes May 22 '21 at 23:28
See also [How can x86 bsr/bsf have fixed latency, not data dependent? Doesn't it loop over bits like the pseudocode shows?](https://stackoverflow.com/q/54509623) re: performance in general and quirks. – Peter Cordes May 22 '21 at 23:28

Is there an x86(_64) instruction which gives index of the highest (or lowest) set (1) bit?

2 Answers2