1

I need to find out how to do a Bit Scan Reverse (return the index of the highest set bit) for 8-bit, 16-bit, 32-bit and 64-bit integers in platform-independent ANSI C for use in an industrial process.

I know about x86's _BitScanReverse, I can't use that here. GCC's __builtin_clz is also a no go.

Sorry to clarify we're using an ancient company-made internal compiler. Any intrinsics would have to be written in AT&T-style assembly and compiled and then linked to.

I've come up with this:

 //Count leading zeros
 unsigned long long clz(unsigned long long x)
 {
      unsigned long long y = 0;
      unsigned long long n = 64;

      y = x >> 32; if (y != 0) { n = n - 32; x = y; }
      y = x >> 16; if (y != 0) { n = n - 16; x = y; }
      y = x >> 8; if (y != 0) { n = n - 8; x = y; }
      y = x >> 4; if (y != 0) { n = n - 4; x = y; }
      y = x >> 2; if (y != 0) { n = n - 2; x = y; }
      y = x >> 1; if (y != 0) return (n - 2);

      return (n - x);
}
 
//Extrapolate highest set bit from clz
unsigned long long bsr(unsigned long long x)
{
     return (sizeof(x) * CHAR_BIT) - clz(x) - 1;
}

This is painfully slow compared to the bsr instruction available on x86/amd64. So I'd like to know if there are any good options to speed this up for 8088, MIPS-16, MIPS-32, ARM-32, ARM-64 or SOCs capable of basic arithmetic(+,-,*,/,%), bit wise operations (&,|,^,~), and bit shifting (<<, >>).

I also need to know if I need to make adjustments to account for endian-ness. These systems use a mix of big-endian and little-endian.

Must be in ANSI C with the exception of calls to compiled assembly.

Thanks in advance.

dave_thenerd
  • 448
  • 3
  • 10
  • 1
    Unfortunately ISO / ANSI C is garbage for exposing things modern CPUs can do efficiently. Although some compilers do have some pattern-recognition e.g. for compiling some loops to a `popcnt` instruction. Related [Position of least significant bit that is set](https://stackoverflow.com/q/757059) for `bsf` equivalents. Wait, are you saying intrinsics *are* allowed? Like GNU C `31 - __builtin_clz()` will hopefully compile to a `bsr` (without undoing flipping it from bit-index to leading-zero count and back) or `31-lzcnt` depending on compiler options. Or to ARM `clz`. – Peter Cordes Dec 14 '21 at 06:50
  • 1
    why must it be ANSI C? The only efficient way is to use multiple implementations with `#ifdef` to use the correct intrinsic for the current compiler/platform – phuclv Dec 14 '21 at 06:51
  • 2
    @PeterCordes there's a proposal to fix ANSI C, similar to what was done for C++ already http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2827.htm#design-bit-leading.trailing.zeroes.ones – Alex Guteniev Dec 14 '21 at 07:57
  • 2
    If it “Must be in ANSI C with the exception of intrinsics”, why “GCC's __builtin_clz is also a no go”? What is an intrinsic if not a compiler built-in? – Eric Postpischil Dec 14 '21 at 09:32
  • Sorry to clarify by intrinsics I meant compiled assembly, not compiler-specific functions. We're using an ancient company-internal compiler. I'm wondering if there's a way to do the clz function branchless. I'm not very good at writing branchless code this complex. And I know for a fact that our compiler is not capable of optimizing this code to be branchless. – dave_thenerd Dec 15 '21 at 23:42
  • 3
    "calls to compiled assembly.": Probably your best bet for systems that have a machine instruction for this. Write the three-line function in assembly, put a prototype in your header file, and call it good. Even with the function call overhead, it's probably a lot better than whatever your compiler would emit. – Nate Eldredge Dec 15 '21 at 23:51
  • Agreed with @Nate; unless there are any idioms that modern GCC/clang can recognize as equivalent to `__builtin_clz` and compile accordingly, like there are for `popcnt`, probably best to just provide hand-written asm, or even inline asm that can truly inline on targets where `#ifdef` says it's usable. Of course if you're going to use `#if` stuff, you might as well `#if` for builtins / intrinsics. And BTW, for BSF there's POSIX `ffs` which is like TZCNT+1 but works for all-zero. – Peter Cordes Dec 16 '21 at 04:13

0 Answers0