2

As said. e.g. for the 8-bit(just for example, no byte order considered) integer 00100100 (base 2), is there an instruction gives 5?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Utoah
  • 1,052
  • 1
  • 12
  • 18
  • possible duplicate of [Bit twiddling: which bit is set?](http://stackoverflow.com/questions/3465098/bit-twiddling-which-bit-is-set) – Will A Sep 26 '10 at 04:24
  • For using it from C or C++, see [What is the fastest/most efficient way to find the highest set bit (msb) in an integer in C?](https://stackoverflow.com/q/671815). (And my answer on [Find most significant bit (left-most) that is set in a bit array](https://stackoverflow.com/a/54760134) for details) – Peter Cordes May 22 '21 at 23:19

2 Answers2

5

Technically, no. There's BSR to find the most significant bit that's set, and BSF to find the least significant bit that's set -- but the smallest item either will work on is a 16-bit word.

Parthian Shot
  • 1,390
  • 1
  • 13
  • 24
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • For a byte, just make sure it's zero-extended into a wider register (like you'd normally do anyway). Since BSR and BSF both find bit-indexes, the extra leading zeros don't matter. (Or with LZCNT which is faster on AMD, use 31-lzcnt e.g. `lzcnt eax, eax` / `xor eax, 31` using the same trick GCC uses to implement `__builtin_clz` in terms of `bsr`) – Peter Cordes May 22 '21 at 23:21
3

Yep, BSR. However, do note that the bithack page claims that at least on one CPU a sequence (unrolled loop) of bit shift operations is faster than a single BSR.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 1
    I'm confused by the comment on the bithacks page. BSR by itself doesn't compute the next highest power-of-two. You have to use `64 - BSR(v - 1)`, which is obviously not "a single BSR assembly language instruction". Perhaps I'm just misreading it. – Marcelo Cantos May 21 '13 at 21:46
  • Not faster. From the bithack page: "On a Athlon™ XP 2100+ I've found the above shift-left and then OR code is as fast as using a single BSR assembly language instruction" – liorda Jun 02 '16 at 09:14
  • Some old AMD CPUs have slow microcoded `bsr` / `bsf` (which are thoroughly obsolete 10 years after this answer was posted). They're down to about 7 uops on modern AMD, with 3 cycle latency (https://agner.org/optimize/), vs. 1 uop / 3c latency on Intel. (Except for the "false" dependency on the output, from its funky behaviour of leaving the destination unchanged on src=0, documented by AMD but implemented by both AMD and Intel.) (`lzcnt` / `tzcnt` are single-uop on all CPUs that support them.) – Peter Cordes May 22 '21 at 23:28
  • See also [How can x86 bsr/bsf have fixed latency, not data dependent? Doesn't it loop over bits like the pseudocode shows?](https://stackoverflow.com/q/54509623) re: performance in general and quirks. – Peter Cordes May 22 '21 at 23:28