9

I am translating a program written in C++ to C#, and I have come across an intrinsic function that I cannot work around. In C++ this is known as:

unsigned char _BitScanForward(unsigned long * Index, unsigned long Mask);

If I only knew what DLL, if any, the intrinsic functions were in, I could use P/Invoke. Since I do not know, I looked for alternatives in the .NET framework, but I have come up empty handed.

Does anyone know how use P/Invoke on _BitScanForward, or an .NET method that does the same thing?

Any help is appreciated, thank you.

M.Babcock
  • 18,753
  • 6
  • 54
  • 84
SvalinnAsgard
  • 225
  • 3
  • 10

5 Answers5

7

Intrinsic functions aren't in any library, they're implemented inside the CPU, the compiler emits the machine code which the CPU recognizes as evoking this particular behavior.

They're a way of getting access to instructions that don't have a simple C equivalent.

Until the .NET optimizer becomes smart enough to recognize them (for example, the Mono JIT recognizes some SIMD instructions, encoded in MSIL as calls to functions of a particular class, similarly the .NET JIT replaces calls to System.Math methods with floating-point operations), your C# code is doomed to run an order of magnitude slower than the original C++.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
4

Wow, looks like there is a question on C# that haven't yet been covered with the recent improvements.

Other commenters have properly noted that the intrinsics like _BitScanForward are not functions per se, those are rather markers for the compiler to inject a specific platform instruction into the object code. It is impossible to emulate an intrinsic in a high-level language (unless you're willing to pay an abstraction penalty). However, good news is that starting with .Net Core 3.0 the JIT does support the intrinsics for a number of hardware platforms.

For the _BitScanForward you might use System.Runtime.Intrinsics.X86.Bmi1.TrailingZeroCount.

Caveat: Don't forget to check for Bmi1.IsSupported before using, otherwise the code would fail at runtime.

You could also get a decent execution speed on ARM (.Net 5.0+) by using their ffs intrinsics:

public int ArmBitScanForward(int x)
  => 32 − System.Runtime.Intrinsics.Arm.ArmBase.LeadingZeroCount(x & −x);
public int ArmBitScanForward(long x)
  => 64 − System.Runtime.Intrinsics.Arm.ArmBase.Arm64.LeadingZeroCount(x & −x);

If neither platform is present, you would have to resort to the bit-twiddling hacks like de-Bruijun sequences:

for i from 0 to 31: table[ ( 0x077CB531 * ( 1 << i ) ) >> 27 ] ← i  // table [0..31] initialized
function ctz5 (x)
    return table[((x & -x) * 0x077CB531) >> 27]

(taken from https://en.wikipedia.org/wiki/Find_first_set)

Depending on the task restrictions, I would choose across different strategies of the algorithm selection at runtime. Branching on each call is likely to kill all the efficiency. The most efficient way is to branch on a level higher - i.e. have three versions of your code to choose from at run time. An easy way to automate codegen is to have your code in a generic from parameterized with a bit-handling type:

public interface IBitScanner
{
  int BitScanForward(int x);
}

public int MyFunction<T>(int[] data)
  where T: new, IBitScanner
{
  var s=0;
  var scanner = new T(); 
  foreach(var i in data)
    s+= scanner.BitScanForward(i);
  return s;
}

Then we define a couple of structs implementing our scanner:

public struct BitScannerX86: IBitScanner
{
   public int BitScanForward(int x)
     => unchecked((int)System.Runtime.Intrinsics.X86.Bmi1.TrailingZeroCount((uint)x));
}
public struct BitScannerArm: IBitScanner
{
   public int BitScanForward(int x)
     => 32 − System.Runtime.Intrinsics.Arm.ArmBase.LeadingZeroCount(x & −x);
}
public struct BitScanner: IBitScanner
{
  private static int[] _table = InitTable();
  private static int[] InitTable()
  {
    var table = new int[32];
    for(var i=0; i<table.Length; i++)
      table[i] = ( 0x077CB531 * ( 1 << i ) ) >> 27;
    return table;
  } 
  public int BitScanForward(int x)
    => _table[((x & -x) * 0x077CB531) >> 27]
}

Now whenever we need a platform-specific version of MyFunction, we do it via MyFunction<BitScannerArm>. Being struct, the type parameter forces JIT to generate the specific code for it instead of a generic one fancying a virtual call. Then, as the T is known at JIT time, the call to BitScanForward gets inlined, and ends up with the appropriate intrinsic injected into the loop. Depending on the MyFunction task size, this version of MyFunction might be saved to a delegate, be part of an interface, or be part of a struct that implements an interface to repeat the trick one level higher.

Note that original question didn't bother with the cross-platform compatibility, as the _BitScanForward is an Intel-only instruction. It was probably Ok in the C++ world of compiling an executable against a specific OS&HW combination; contemporary managed code like Java/.Net has a chance to be executed anywhere.

3

The _BitScanForward C++ function is an intrinsic compiler function. It finds the first on bit in a sequence of bytes searching from the lowest order bit to the highest and returning the value of the bit. You could probably implement something similar using bit manipulation tactics in C# (though it'll never come close to the same performance). If you're comfortable with bit manipulation in C++ then its basically the same in C#.

M.Babcock
  • 18,753
  • 6
  • 54
  • 84
3

_BitScanForward searches for the first set bit in an integer, starting from the least significant bit searching towards the most significant bit. It compiles to the bsf instruction on the x86 platform.

The bit twiddling hacks page includes a handful of potential replacement algorithms that excel in different situations. There's an O(N) function (that half the time with uniformly-distributed inputs returns with only one iteration) and some sub-linear options, and some that make use of multiplication steps. Picking one might not be trivial, but any should work.

sarnold
  • 102,305
  • 22
  • 181
  • 238
2

It is not possible to P/Invoke _BitScanForward because it is a compiler intrinsic, not an actual library function (it gets translated by the Visual C++ compiler to a BSF x86 machine instruction). As far as I'm aware, there is no MSIL instruction for this "find first set" operation. The simplest thing to do is write your own C++ native DLL that exports a function that invokes _BitScanForward(), and then P/Invoke that.

You can also write it directly in C# using bit manipulation (see Algorithms for find first set in Wikipedia). I'm not sure if this would be faster or slower than P/Invoke. Measure and find out.

Chiara Coetzee
  • 4,201
  • 1
  • 24
  • 20