17

There are a lot of questions about this on StackOverflow. A lot. However I cannot find an answer that:

  • Works in C#
  • Works for 64-bit integers (as opposed to 32-bit)

Faster than:

private static int Obvious(ulong v)
{
    int r = 0;
    while ((v >>= 1) != 0) 
    {
        r++;
    }
    return r;
}

Or even

int r = (int)(Math.Log(v,2));

I'm assuming a 64-bit Intel CPU here.

One useful reference is the Bit Hacks page and another is fxtbook.pdf However, while these gives useful direction to approach the problem, they do not give a ready answer.

I'm after a re-usable function that can do something similar to _BitScanForward64 and _BitScanReverse64 only for C#.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Andrew Savinykh
  • 25,351
  • 17
  • 103
  • 158
  • 1
    Isn't this essentially the same as http://stackoverflow.com/questions/10439242/count-leading-zeroes-in-an-int32 ? Obviously, you'd have to adjust it for 64 bits, and it gives you the opposite of the number you are looking for, but it conveys the same information. – Taekahn Jul 13 '15 at 02:46
  • @Taekahn adjusting to 64-bit is not trivial at all. Try it. As I acknowledged in the question 32-bit answer does exist on SO. – Andrew Savinykh Jul 13 '15 at 02:58

6 Answers6

20

.NET Core 3.0 added BitOperations.LeadingZeroCount and BitOperations.TrailingZeroCount so you can use them directly. They'll be mapped to the x86's LZCNT/BSR and TZCNT/BSF instructions, hence extremely efficient

int mostSignificantPosition = 63 - BitOperations.LeadingZeroCount(0x1234L);
int leastSignificantPosition = BitOperations.TrailingZeroCount(0x1234L);

Alternatively the most significant bit's position can be calculated like this

int mostSignificantPosition = x == 0 ? 0 : BitOperations.Log2(x) + 1
phuclv
  • 37,963
  • 15
  • 156
  • 475
12

One of the ways of doing this, that is described on the Bit Hacks page linked in the question is leveraging De Bruijn sequence. Unfortunately this page does not give a 64-bit version of said sequence. This useful page explains how De Bruijn sequences can be constructed, and this one gives an example of the sequence generator written in C++. If we adapt the given code we can generated multiple sequences, one of which is given in the C# code below:

public static class BitScanner
{
    private const ulong Magic = 0x37E84A99DAE458F;

    private static readonly int[] MagicTable =
    {
        0, 1, 17, 2, 18, 50, 3, 57,
        47, 19, 22, 51, 29, 4, 33, 58,
        15, 48, 20, 27, 25, 23, 52, 41,
        54, 30, 38, 5, 43, 34, 59, 8,
        63, 16, 49, 56, 46, 21, 28, 32,
        14, 26, 24, 40, 53, 37, 42, 7,
        62, 55, 45, 31, 13, 39, 36, 6,
        61, 44, 12, 35, 60, 11, 10, 9,
    };

    public static int BitScanForward(ulong b)
    {
        return MagicTable[((ulong) ((long) b & -(long) b)*Magic) >> 58];
    }

    public static int BitScanReverse(ulong b)
    {
        b |= b >> 1;
        b |= b >> 2;
        b |= b >> 4;
        b |= b >> 8;
        b |= b >> 16;
        b |= b >> 32;
        b = b & ~(b >> 1);
        return MagicTable[b*Magic >> 58];
    }
}

I also posted my C# port of the sequence generator to github

Another related article not mentioned in the question with decent cover of De Bruijn sequences, can be found here.

Andrew Savinykh
  • 25,351
  • 17
  • 103
  • 158
8

As per my comment, this is a function in C# to count leading zero bits modified for a 64 bit integer.

public static UInt64 CountLeadingZeros(UInt64 input)
{
    if (input == 0) return 64;

    UInt64 n = 1;

    if ((input >> 32) == 0) { n = n + 32; input = input << 32; }
    if ((input >> 48) == 0) { n = n + 16; input = input << 16; }
    if ((input >> 56) == 0) { n = n + 8; input = input << 8; }
    if ((input >> 60) == 0) { n = n + 4; input = input << 4; }
    if ((input >> 62) == 0) { n = n + 2; input = input << 2; }
    n = n - (input >> 63);

    return n;
}

UPDATE:
If you're using a newer version of C#, check to see if this is built-in per the answer below. https://stackoverflow.com/a/61141435/1587755

Taekahn
  • 1,592
  • 1
  • 13
  • 16
  • 1
    According to my performance tests, this is faster than mine, on my input. Well done, thank you! – Andrew Savinykh Jul 13 '15 at 10:06
  • If I may, I am curious what you would use this for? Try as I might, I can't think of any practical applications for it. – Taekahn Jul 13 '15 at 16:05
  • I'm running certain mathematical modeling simulations. Shaving off a few milliseconds here and there allow them to complete faster as they process billions of samples in each batch. – Andrew Savinykh Jul 13 '15 at 20:27
  • At the moment the whole thing is running about 6 times faster than when I started (taking 40 minutes per simulation where as I started with 4 hours per simulation) and the hotspots in the profiler are currently `Array.Clone` and `Dictionary.TryGetValue`. This indicates to me that it's likely that the only things I can optimize further are coming up with better data pruning for making samples smaller. – Andrew Savinykh Jul 13 '15 at 20:34
  • 1
    Interesting. Your job sounds more fun than mine.....but I digress. Thanks for sharing :) – Taekahn Jul 13 '15 at 22:18
  • Finding the most significant bit is far from impractical. It's a very common and useful operation, like finding log2 of a number, or finding how many bytes/bits a number fits into which is used in many encodings. That's why there's a hardware instruction to do that in most modern architectures. However doing such compute intensive tasks in C# is not the efficient way to do. Writing native code that utilizing [SIMD](https://en.wikipedia.org/wiki/SIMD) would be much better – phuclv Apr 10 '20 at 12:47
2

The fastest way to get most significant bit at IL code should be a float conversion and accessing the exponent bits.

Save code:

int myint = 7;
int msb = (BitConverter.SingleToInt32Bits(myint) >> 23) - 0x7f;

An even faster way would be the msb and lsb cpu instructions. As mentioned by phuclv it got availibele at .Net Core 3.0 so I addded a test that is unfortunately not much faster.

As requested here are the BenchmarkDotNet results for 10000 coverts of uint and ulong. The speedup was factor 2 so the BitScanner solution is fast, but can't beat native float conversion.

           Method |     Mean |    Error |   StdDev | Ratio
BitScannerForward | 34.37 us | 0.420 us | 0.372 us |  1.00
BitConverterULong | 18.59 us | 0.238 us | 0.223 us |  0.54
 BitConverterUInt | 18.58 us | 0.129 us | 0.121 us |  0.54
     NtdllMsbCall | 31.34 us | 0.204 us | 0.170 us |  0.91       
 LeadingZeroCount | 15.85 us | 0.169 us | 0.150 us |  0.48
Hessi9
  • 91
  • 4
  • since there is an accepted answer that is different than your answer you should do some speed tests and post results with your answer to show that it is faster. @ me if you do that and I'll up-vote your answer. ... you will need to account for the question specifying that it deals with 64 bit ints also and specifically excludes 32 bit ints. ... you probably just need to delete this answer because of that. Generally speaking though with speed Q&A always post speed tests and results and ideally data-sets to o with them. – Rodger Mar 12 '20 at 22:37
  • 1
    I doubt that your result is correct. `BitOperations.LeadingZeroCount` should be far faster than converting to float then do some manipulation – phuclv Apr 17 '20 at 15:19
  • I added a test for BitOperations.LeadingZeroCount that is faster but disapointing slow. So if you have the luck of .Net Core 3.0 compatible target platform you should use it, if not float convertion is the fastest way. – Hessi9 Apr 18 '20 at 18:33
2

@Taekahn gave a great answer. I'll just improve upon it slightly:

[System.Runtime.CompilerServices.MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int CountLeadingZeros(this ulong input)
{
    const int bits = 64;
    // if (input == 0L) return bits; // Not needed. Use only if 0 is very common.
    int n = 1;
    if ((input >> (bits - 32)) == 0) { n += 32; input <<= 32; }
    if ((input >> (bits - 16)) == 0) { n += 16; input <<= 16; }
    if ((input >> (bits - 8)) == 0) { n += 8; input <<= 8; }
    if ((input >> (bits - 4)) == 0) { n += 4; input <<= 4; }
    if ((input >> (bits - 2)) == 0) { n += 2; input <<= 2; }
    return n - (int)(input >> (bits - 1));
}
  • Avoids the slightly magic numbers, instead making their intent more obvious with (bits - x).
  • Adaption to different wordlengths should now be obvious and trivial.
  • Treating (input == 0) as special is unneccesary and removing this will speed up all other input.
  • Using int for the counter is more reasonable than using UInt64. (Could even make it a byte, but int is the default integer type and supposedly the fastest for every platform.)
  • Added attribute for aggressive inlining, to ensure optimal performance.

There is no need to compute any of the "(bits - x)" at runtime, so the compiler should precompute them. Thus the increased readability comes at no cost.

Edit: As pointed out by @Peter Cordes, you should probably just use System.Numerics.BitOperations.LeadingZeroCount if you have the BitOperations class available. I, for one, often do not.

Jan Heldal
  • 148
  • 6
  • 1
    Is there any point to this in 2020? `BitOperations.LeadingZeroCount` should be faster if it JITs efficiently. Or equal if compiling for a target architecture that doesn't have hardware bitscan. I could imagine this being faster for a compile-time-constant input if C# doesn't do constant-propagation through the bitops version, but hopefully it can. – Peter Cordes Aug 07 '20 at 16:26
  • 1
    @Peter Cordes: There are many variants of .NET platforms, and not all of them have access to the BitOperations class. At our company, we are still using legacy "Portable" projects for some products, and System.Numerics.BitOperations simply does not exist. – Jan Heldal Aug 14 '20 at 12:01
  • Your answer does not give the correct result for zero. You will end up with 1+32+16+8+4+2-0=63 and not with 64. – Arnaud Apr 06 '21 at 07:47
1

Since we're talking about .NET here, it's usually preferable not to resort to external native calls. But if you can tolerate the overhead of a managed/unmanaged roundtrip for each operation, the following two calls provide pretty direct and unadulterated access to the native CPU instructions.

The (minimalistic) disassembly of the respective entire functions from ntdll.dll are also shown for each. That library will be present on any Windows machine, and will always be found, if referenced as shown.

Least-significant bit (LSB):

[DllImport("ntdll"), SuppressUnmanagedCodeSecurity]
public static extern int RtlFindLeastSignificantBit(ulong ul);

// X64:
//      bsf rdx, rcx
//      mov eax, 0FFFFFFFFh
//      movzx ecx, dl
//      cmovne eax,ecx
//      ret

Most-significant bit (MSB):

[DllImport("ntdll"), SuppressUnmanagedCodeSecurity]
public static extern int RtlFindMostSignificantBit(ulong ul);

// X64:
//      bsr rdx, rcx
//      mov eax, 0FFFFFFFFh
//      movzx ecx, dl
//      cmovne eax,ecx
//      ret

Usage:
Here's a usage example which requires that the above declarations be accessible. Couldn't be simpler.

int ix;

ix = RtlFindLeastSignificantBit(0x00103F0A042C1D80UL);  // ix --> 7

ix = RtlFindMostSignificantBit(0x00103F0A042C1D80UL);   // ix --> 52
Glenn Slayden
  • 17,543
  • 3
  • 114
  • 108