XCode and _bittest function

Question

I've got a little C++ project that was developed for Win32 and I want to port it to OSX. The code uses functions like _bittest and _bittest64 but I haven't found same functions in the XCode header files.

What could be an alternative for these functions? May be there are good working polyfills. The project is a legacy indeed, no extra performance is required at the moment.

Have you tried to search for documentation about these functions, to see what they do? That way you can more easily search for (or perhaps implement yourself) replacements. — Some programmer dude, Jun 13 '18 at 08:29
Basically a duplicate of [Using bts assembly instruction with gcc compiler](https://stackoverflow.com/q/1983303). As my answer explains, you don't want the actual instruction on modern x86. (Although compilers have missed optimizations for using the register-register version of `bt` and `bts`, which are fast on both Intel and AMD.) — Peter Cordes, Jun 13 '18 at 09:39

IInspectable · Accepted Answer · 2018-06-14T08:16:43.297

5

The _bittest and _bittest64 symbols are compiler intrinsics, that emit Bit-test instructions, specifically x86 bt, to examine the value of a bit at a zero-based index.

With a memory operand, bt has crazy-CISC bitstring behaviour where the bit index can go outside the dword/qword of memory selected by the addressing mode. This is slow and why compilers will load the operand into a register first. But this is what the MSVC intrinsic is for. Otherwise it wouldn't need to be an intrinsic.

The following C++ matches the behaviour of register-arg version of the bt instruction, wrapping the shift count at the register width, i.e. effectively looking only at the low bits. (This matches the MSVC intrinsic if b is <32 or <64.) See the updated code and comments for discussion of how to implement the MSVC semantics which let it access outside the pointed-to long or long long.

Also beware that long is a 32-bit type in the x64 Windows ABI, but a 64-bit type in the x86-64 System V ABI (which you're using on OS X, unless you build obsolete 32-bit code). You may want to change your code to int32_t or uint32_t to avoid leaving unused bits in each long, depending on how you're using it.

inline
unsigned char bittest(long const *a, long b)
{
    auto const value{ *a };
    auto const mask{ 1L << (b&31) };
    auto const masked_value{ value & mask };
    return unsigned char{ masked_value != 0 };
}

inline
unsigned char bittest64(long long const *a, long long b)
{
    auto const value{ *a };
    auto const mask{ 1LL << (b&63) };
    auto const masked_value{ value & mask };
    return unsigned char{ masked_value != 0 };
}

I'm not aware of any GCC or Clang intrinsics with identical functionality. If needed, you could resort to emitting assembly instructions from the function implementations instead, but bt with a memory operand is slow so it's normally best to implement in pure C++ and let the compiler do a good job.

Update:

After discussing the code emitted from the intrinsics, it has become clear, that the previously proposed replacement code only covers part of the functionality. In particular, the intrinsics allow indexing bits outside the memory occupied by *a. The following implementations account for that as well.

inline
unsigned char bittest(std::int32_t const *a, std::int32_t b)
{
    auto const bits{ reinterpret_cast<unsigned char const*>(a) };
    auto const value{ bits[b >> 3] };
    auto const mask{ (unsigned char)(1 << (b & 7)) };
    return (value & mask) != 0;
}

inline
unsigned char bittest64(std::int64_t const *a, std::int64_t b)
{
    auto const bits{ reinterpret_cast<unsigned char const*>(a) };
    auto const value{ bits[b >> 3] };
    auto const mask{ (unsigned char)(1 << (b & 7)) };
    return (value & mask) != 0;
}

edited Jun 14 '18 at 08:16

answered Jun 13 '18 at 09:10

IInspectable

46,945
8
85
181

Why take a pointer arg at all here? This isn't asm; it's up to the compiler where the value is after inlining. So it's easier to just take an arg by value. You want these functions inline anyway so you don't get instructions to actually create a boolean return value in an integer register. – Peter Cordes Jun 13 '18 at 09:39
@PeterCordes: Those functions are meant to be 1:1 replacements, that's why I left the interface unchanged. The OP is porting code, and presumably doesn't want to modify the calling code. – IInspectable Jun 13 '18 at 09:42
Hmm, I added `&31` and `&63` to the shift count because x86 does mask the shift count. But a portable version of this should probably use `sizeof(long)*CHAR_BIT - 1`, or the C++ equivalent. – Peter Cordes Jun 13 '18 at 09:42
Oh right, I forgot about the ultimate source of the question being the MSVC intrinsics :P. Hmm, I'll double-check that the intrinsic isn't actually for the memory-destination version, where we do need to handle indexing outside `*a`. – Peter Cordes Jun 13 '18 at 09:43
1

@PeterCordes: A portable version should probably leave the code as is, and `static_assert` on the actual integer sizes. After all, the source implementation expects 32 and 64 values, and doesn't adjust to the actual sizes used by the language implementation. – IInspectable Jun 13 '18 at 09:45
Or possibly use `int64_t` / `int32_t`? No probably best to stick with `long` and `long long` for alias-compatibility. – Peter Cordes Jun 13 '18 at 09:47
@PeterCordes: Unfortunately, the [fixed width integer types](http://en.cppreference.com/w/cpp/types/integer) are optional in C++. A conforming implementation can decide against supplying any of them. – IInspectable Jun 13 '18 at 09:49
Yeah, but it's more or less safe to assume that any implementation where `long` or `int` is a 2's complement signed type with the right width will have `int32_t`. That *is* what we want to check (except I guess we don't need to require 2's complement. Probably we should cast to `uint32_t`, because MSVC `long` is a 32-bit type, but it's a 64-bit type on x86-64 MacOS. So bit-indexing will be broken anyway. – Peter Cordes Jun 13 '18 at 09:56
I checked, and MSVC `_bittest` does indeed use the bit-index directly with a memory operand. https://godbolt.org/g/Ev42LS. I guess that's the whole point, and why it needs to be an intrinsic instead of a peephole optimization: it *can* index outside `*a`, specifically to `a[b>>3]`. Probably we want `unsigned char` and guarantee no padding. – Peter Cordes Jun 13 '18 at 09:59
@PeterCordes: In other words: The replacement functions need to be extended, to allow arbitrary indexing. I'm not sure I understand, where `unsigned char` comes into play. Are you proposing to change the interface? If not, then I'm not sure what to gain from casting to `unsigned char`. – IInspectable Jun 13 '18 at 10:19
`unsigned char` is strict-aliasing safe (like I assume `_bittest` is on MSVC, if MSVC even optimizes based on strict-aliasing at all). And it gives us *portable* bit-access to the object representation of `long`. We don't have this in general because `long` is allowed to have padding, but `unsigned char` isn't. We can take a `const long *` and cast it to `const char *` inside the function. – Peter Cordes Jun 13 '18 at 10:21
@PeterCordes: Would [this implementation](https://godbolt.org/g/kmGgoF) exhibit identical behavior to the `_bittest` intrinsic? – IInspectable Jun 13 '18 at 10:56
Yes, I just tested the real instruction and a signed array index is correct. (e.g. `mov edx, -20*8` / `bt [buf+100], edx` does access `[buf+80]`, in 64-bit mode where pointers are 64-bit, proving that *sign*-extending the count to 64-bit is correct, like your code does. And BTW, it's *much* easier to read when optimized. You left out `-Ox` / `-O3`. https://godbolt.org/g/PBUbV9) Oh, I didn't check that `b >>3` is correct, vs. `b / 8`. They round differently for negative `b`, and the ISA ref manual says "count DIV 32" for the dword accessed by the 32-bit operand version. – Peter Cordes Jun 13 '18 at 11:35
`>> 3` is correct. We need a non-negative remainder to index into the selected `char`. I double-checked actual behaviour with `mov edx, -10` / `bt [buf+8], edx`, and it set `[buf+6]` to `0x40` (from 0). `bts` uses the same indexing as `bt`. – Peter Cordes Jun 13 '18 at 12:03
And BTW, it needs to be an arithmetic right shift. Not sure if `int32_t` guarantees *that*, or if it's still implementation defined. – Peter Cordes Jun 13 '18 at 12:16
Nice update. Unfortunately, using `char*` and splitting the bit index into byte / bit components probably doesn't optimize away for the case of `b=13` or something. Hopefully these intrinsics only get used on arrays that are already in memory, and not on a single scalar variable, in which case it might not matter (much). – Peter Cordes Jun 14 '18 at 08:03
@PeterCordes: I updated the answer to include the code that implements the full set of features of the intrinsics. I didn't find documentation that makes a call one way or another with respect to `int32_t` guaranteeing an arithmetic right shift. It would seem natural that it does, considering that it's guaranteed to use 2's complement, and all compilers emit an `sar` instruction, too. – IInspectable Jun 14 '18 at 08:03
Yeah, I'm not aware of any existing C++ implementations on 2's complement machines that choose to implement signed right shift as logical instead of arithmetic. GNU C might even guarantee that signed right shift is arithmetic (along with integers being 2's complement). This is one of those cases where we should probably consider this portable *enough*, because C++ allows so many gotchas... – Peter Cordes Jun 14 '18 at 08:11

XCode and _bittest function

1 Answers1

Update:

Linked