19

The following code calls the builtin functions for clz/ctz in GCC and, on other systems, has C versions. Obviously, the C versions are a bit suboptimal if the system has a builtin clz/ctz instruction, like x86 and ARM.

#ifdef __GNUC__
#define clz(x) __builtin_clz(x)
#define ctz(x) __builtin_ctz(x)
#else
static uint32_t ALWAYS_INLINE popcnt( uint32_t x )
{
    x -= ((x >> 1) & 0x55555555);
    x = (((x >> 2) & 0x33333333) + (x & 0x33333333));
    x = (((x >> 4) + x) & 0x0f0f0f0f);
    x += (x >> 8);
    x += (x >> 16);
    return x & 0x0000003f;
}
static uint32_t ALWAYS_INLINE clz( uint32_t x )
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);
    return 32 - popcnt(x);
}
static uint32_t ALWAYS_INLINE ctz( uint32_t x )
{
    return popcnt((x & -x) - 1);
}

#endif

What functions do I need to call, which headers do I need to include, etc to add a proper ifdef for MSVC here? I've already looked at this page, but I'm not entirely sure what the #pragma is for (is it required?) and what restrictions it puts on MSVC version requirements for compilation. As someone who doesn't really use MSVC, I also don't know whether these intrinsics have C equivalents on other architectures, or whether I have to #ifdef x86/x86_64 as well when #defining them.

GregC
  • 7,737
  • 2
  • 53
  • 67
Dark Shikari
  • 7,941
  • 4
  • 26
  • 38
  • The page you refer to above refers to a function that is part of the .NET runtime, are you trying to build your program for .NET or as a native Windows executable? – Timo Geusch Dec 10 '08 at 13:49
  • 1
    It's a native Windows executable--part of the reason I'm asking is that I've found it rather difficult to find Microsoft documentation pages that actually talk about C these days. – Dark Shikari Dec 10 '08 at 18:04
  • Libcxx implementation https://github.com/llvm-mirror/libcxx/blob/9dcbb46826fd4d29b1485f25e8986d36019a6dca/include/support/win32/support.h#L106-L182 – KindDragon Apr 03 '17 at 10:35

6 Answers6

29

Bouncing from sh0dan code, the implementation should be corrected like this :

#ifdef _MSC_VER
#include <intrin.h>

uint32_t __inline ctz( uint32_t value )
{
    DWORD trailing_zero = 0;

    if ( _BitScanForward( &trailing_zero, value ) )
    {
        return trailing_zero;
    }
    else
    {
        // This is undefined, I better choose 32 than 0
        return 32;
    }
}

uint32_t __inline clz( uint32_t value )
{
    DWORD leading_zero = 0;

    if ( _BitScanReverse( &leading_zero, value ) )
    {
       return 31 - leading_zero;
    }
    else
    {
         // Same remarks as above
         return 32;
    }
}
#endif

As commented in the code, both ctz and clz are undefined if value is 0. In our abstraction, we fixed __builtin_clz(value) as (value?__builtin_clz(value):32) but it's a choice

crazyjul
  • 2,519
  • 19
  • 26
  • 4
    An almost 1-to-1 replacement for `__builtin_clz()` in MSVC is `__lzcnt()`. The hardware must support SSE4 though. [More info](https://msdn.microsoft.com/en-US/library/bb384809.aspx). – rustyx Jan 27 '16 at 10:23
  • 2
    My hardware supports SSE4, but not BMI1, so __lzcnt() compiles but doesn't do what I'd expect, rather working as a BSR. – GregC Mar 14 '17 at 15:36
  • 2
    `31 ^__builtin_clz` is equal to `_BitScanReverse` – KindDragon Feb 16 '18 at 13:16
  • Note that GNU C `__builtin_ctz` and clz also have undefined behaviour when the input value is `0` (https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html); this allows them to inline as a single `bsf` instruction (or `31 ^ bsr` which works for the range of defined outputs). If you need to handle possibly-zero inputs, then you'd want similar wrappers for the GNU C builtins, so the appropriate thing would to have a portability layer around BSF / 31^BSR, and then zero-handling on top of that... And using lzcnt `#ifdef __BMI1__`. – Peter Cordes Oct 17 '20 at 10:04
  • Related: [VS: unexpected optimization behavior with \_BitScanReverse64 intrinsic](https://stackoverflow.com/a/41352456) - the MSVC doesn't expose the destination-unmodified behaviour of the asm instruction, even though it has an API that *could* do that. (So you don't need to initialize the `index` output arg; it doesn't hurt though, the compiler knows it's an output-only operand of the intrinsic.) – Peter Cordes Oct 17 '20 at 10:08
3
  1. The equivalent function for int __builtin_ctz (unsigned int x) in MSVC is unsigned int _tzcnt_u32 (unsigned int a) for 32 bit integer and returns count of trailing zeros. For 64 bit use unsigned __int64 _tzcnt_u64 (unsigned __int64 a) 1.

  2. The equivalent function for int __builtin_clz (unsigned int x) in MSVC is unsigned int _lzcnt_u32 (unsigned int a) for 32 bit integer and returns count of leading zeros. For 64 bit use unsigned __int64 _lzcnt_u64 (unsigned __int64 a) 2

C++ Header: immintrin.h

Cipher
  • 41
  • 2
  • 1
    Not all computers have BMI1, so `lzcnt` may decode as `bsr` and give 31-clz instead of the `clz` you're expecting. There is an MSVC intrinsic for `bsr`, specifically `_BitScanReverse`. They're only equivalent if you would / could have compiled with `gcc -mbmi` (or of course `gcc -march=haswell` or something that includes BMI1). See [Does x64 support imply BMI1 support?](https://stackoverflow.com/q/61422827) for old-hardware compat issues (and current-gen Pentium / Celeron low-end CPUs, thanks Intel) with these intrinsics. This is a useful answer, but only with an [edit] to mention that. – Peter Cordes May 03 '21 at 16:42
  • All processor may not support BMI1 instruction set. However, based on detection of BMI1 instructions availability, lzcnt can be used. Otherwise bsr can be used. – Cipher May 03 '21 at 19:35
  • Yes, exactly. You have to check your CPU and manually use `clz = 31-bsr(x);` to exactly emulate `__builtin_clz` on CPUs without BMI1. But your answer doesn't say that. It wrongly implies that `_lzcnt_u32` will give you identical results to `__builtin_clz` in general. But unlike GCC, MSVC lets you use intrinsics without needing to compile with any `-march=haswell` equivalent to "promise" that you'll only run the binary on CPUs that support some instruction-set extensions. – Peter Cordes May 04 '21 at 04:38
  • (And BTW, to exactly emulate `_lzcnt_u32` without BMI1, you'd want `lzcnt = x==0 ? 32 : 31-bsr(x);`. `__builtin_clz` has undefined behaviour if run with an input of 0, allowing it to compile to just a `bsr(x) ^ 31`, but lzcnt has well-defined behaviour for input == 0.) – Peter Cordes May 04 '21 at 04:40
  • I didn't get the sense of what is "manually use clz = 31-bsr(x)" and why do you want to get leading / trailing zero count for input == 0? – Cipher May 04 '21 at 07:01
  • [BSR](https://www.felixcloutier.com/x86/bsr) doesn't handle input==0, but `_lzcnt_u32` is well-defined to return `32` in that case. If you want to emulate lzcnt in terms of BSR, you need to special-case input == 0. As far as "manually", I mean if you can't assume that your code will always run on a machine with `lzcnt`, you should use `_BitScanReverse(&tmp, x); return 31 - tmp;` to get the same result (for all non-zero x). https://www.felixcloutier.com/x86/lzcnt explains how it differs from BSR. – Peter Cordes May 04 '21 at 07:05
  • Ok... So, you mean there is no way to detect if BMI1 is supported or not and we need to assume always before usage of lzcnt, is it the concern? As per GCC documentation, __builtin_clz / ctz result is undefined for input == 0 as well. So, you wanted to use clz/ctz intrinsics to count total number of bits by providing 0, is it correct? – Cipher May 04 '21 at 07:26
  • Well you could write 2 versions of your function that uses clz, and do runtime dispatching (e.g. via a function pointer that you set once based on `cpuid`). But like most of the rest of BMI1/BMI2, it's usually not worth it to actually dispatch for, just use one version that *does* work everywhere, using BSR. If input==0 is possible for your use-case, then yeah you need some kind of workaround. Like `bsr(x|1)` so there's always a bit for it to find, instead of actually branching. So if x was 0 or 1, you get bsr(x|1) == 0. – Peter Cordes May 04 '21 at 07:39
1

I find it in a korean website https://torbjorn.tistory.com/317 In msvc compiler, you can use __lzcnt(unsigned int) to replace __builtin_clz(unsigned int) in gcc compiler.

dboy
  • 1,004
  • 2
  • 16
  • 24
  • Note that the `lzcnt` instruction requires BMI1. On older CPUs, it executes as `bsr`, giving `31-lzcnt` (and leaving the destination register unmodified for input=0). GCC will only expand `__builtin_clz` as `lznct` if you compile with `-march=haswell` or similar options. – Peter Cordes Oct 17 '20 at 09:56
0

If MSVC has a compiler intrinsic for this, it'll be here:

Compiler Intrinsics on MSDN

Otherwise, you'll have to write it using __asm

GregC
  • 7,737
  • 2
  • 53
  • 67
Ana Betts
  • 73,868
  • 16
  • 141
  • 209
-3

Tested on linux and windows (x86) :

#ifdef WIN32
    #include <intrin.h>
    static uint32_t __inline __builtin_clz(uint32_t x) {
        unsigned long r = 0;
        _BitScanReverse(&r, x);
        return (31-r);
    }
#endif

uint32_t clz64(const uint64_t x)
{
    uint32_t u32 = (x >> 32);
    uint32_t result = u32 ? __builtin_clz(u32) : 32;
    if (result == 32) {
        u32 = x & 0xFFFFFFFFUL;
        result += (u32 ? __builtin_clz(u32) : 32);
    }
    return result;
}
Tanguy
  • 2,227
  • 1
  • 18
  • 8
  • Have you tested performance of your clz64? I wouldn't be surprised that all this branching makes it slower than the OP's implementation. – plamenko Sep 24 '16 at 13:13
  • Use `__builtin_clzll` like a normal person if you want 64-bit integer support on GNU C. Writing it this way will probably stop GCC from using a single 64-bit `bsr` or `lzcnt` in 64-bit builds. (But then you'd have a 64-bit MSVC intrinsic available, too.) – Peter Cordes Oct 17 '20 at 09:59
-4

There are two intrinsics "_BitScanForward" and "_BitScanReverse", which suits the same purpose for MSVC. Include . The functions are:

#ifdef _MSC_VER
#include <intrin.h>

static uint32_t __inline ctz( uint32_t x )
{
   int r = 0;
   _BitScanReverse(&r, x);
   return r;
}

static uint32_t __inline clz( uint32_t x )
{
   int r = 0;
   _BitScanForward(&r, x);
   return r;
}
#endif

There are equivalent 64bit versions "_BitScanForward64" and "_BitScanReverse64".

Read more here:

x86 Intrinsics on MSDN

GregC
  • 7,737
  • 2
  • 53
  • 67
sh0dan
  • 160
  • 8
  • 13
    ctz & clz call the wrong functions (they should be using _BitScanForward & BitScanReverse respectively, not BitScanReverse/BitScanForward) & clz is wrong since it returns the offset of the bit set instead of the number of leading zeroes. – Vitali Dec 16 '11 at 00:54