11

My processor (Intel i7) supports the POPCNT instruction and I would like to call it from my C# application. Is this possible?

I believe I read somewhere that it isn't, but the JIT will invoke it if it finds it available but what function would I have to call that may be substituted with such an instruction?

Popcount is being called millions of times in a loop so I'd like to be able to have this CPU optimization if possible.

Ryan Peschel
  • 11,087
  • 19
  • 74
  • 136
  • 4
    Is C# the right language for this? I thought we used languages like C# so we don't have to think (that hard) about CPU instructions. – Doug Dawson Mar 13 '15 at 19:40
  • No, it is not. However, I prefer working with C#. – Ryan Peschel Mar 13 '15 at 19:41
  • 1
    This question has been asked and answered on StackOverFlow. [1]: http://stackoverflow.com/questions/6097635/checking-cpu-popcount-from-c-sharp – Kyle Williamson Mar 13 '15 at 19:42
  • @KyleWilliamson that question is about how to determine if the CPU supports the instruction, not how to call it. – crashmstr Mar 13 '15 at 19:44
  • 6
    "How do I hammer in this nail with a screwdriver? I know it's the wrong tool, but I hate hammers and lover screwdrivers." If you need to do this then you need to use a different language. If that is not obvious to you then I'm afraid you will likely mess up the implementation anyway. – Ed S. Mar 13 '15 at 19:47
  • Ah, you are correct. Sorry about that... The post still may be relevant for it says that C# can't do CPU level optimizations. "The JIT compiler in the common language runtime is able to do some optimization when the code is actually run, but there is no direct access to that process from the language itself." – Kyle Williamson Mar 13 '15 at 19:48
  • 1
    [This question](http://stackoverflow.com/a/9090090/995714) also has some related information. [Another way](https://adnanboz.wordpress.com/2011/02/26/how-to-use-cpu-instructions-in-c-to-gain-performace/) is writing the bottle neck part in unmanaged C++ – phuclv Mar 13 '15 at 19:52
  • Do you want to use the instruction yourself? or do you want .Net to use SSE 4 for optimizations? – AK_ Mar 13 '15 at 20:31

1 Answers1

15

You want to play with fire, and here we like to play with fire...

class Program
{
    const uint PAGE_EXECUTE_READWRITE = 0x40;
    const uint MEM_COMMIT = 0x1000;

    [DllImport("kernel32.dll", SetLastError = true)]
    static extern IntPtr VirtualAlloc(IntPtr lpAddress, IntPtr dwSize, uint flAllocationType, uint flProtect);

    private delegate int IntReturner();

    static void Main(string[] args)
    {
        List<byte> bodyBuilder = new List<byte>();
        bodyBuilder.Add(0xb8); // MOV EAX,
        bodyBuilder.AddRange(BitConverter.GetBytes(42)); // 42
        bodyBuilder.Add(0xc3);  // RET
        byte[] body = bodyBuilder.ToArray();
        IntPtr buf = VirtualAlloc(IntPtr.Zero, (IntPtr)body.Length, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
        Marshal.Copy(body, 0, buf, body.Length);

        IntReturner ptr = (IntReturner)Marshal.GetDelegateForFunctionPointer(buf, typeof(IntReturner));
        Console.WriteLine(ptr());
    }
}

(this small example of assembly will simply return 42... I think it's the perfect number for this answer :-) )

In the end the trick is that:

A) You must know the opcodes corresponding to the asm you want to write

B) You use VirtualAlloc to make a page of memory executable

C) In some way you copy your opcodes there

(the code was taken from http://www.cnblogs.com/netact/archive/2013/01/10/2855448.html)

Ok... the other one was as written on the site (minus an error on the uint -> IntPtr dwSize), this one is how it should be written (or at least it's a +1 compared to the original... I would encapsulate everything in a IDisposable class instead of using try... finally)

class Program
{
    const uint PAGE_READWRITE = 0x04;
    const uint PAGE_EXECUTE = 0x10;
    const uint MEM_COMMIT = 0x1000;
    const uint MEM_RELEASE = 0x8000;

    [DllImport("kernel32.dll", SetLastError = true)]
    static extern IntPtr VirtualAlloc(IntPtr lpAddress, IntPtr dwSize, uint flAllocationType, uint flProtect);

    [DllImport("kernel32.dll", SetLastError = true)]
    [return: MarshalAs(UnmanagedType.Bool)]
    static extern bool VirtualProtect(IntPtr lpAddress, IntPtr dwSize, uint flAllocationType, out uint lpflOldProtect);

    [DllImport("kernel32.dll", SetLastError = true)]
    [return: MarshalAs(UnmanagedType.Bool)]
    static extern bool VirtualFree(IntPtr lpAddress, IntPtr dwSize, uint dwFreeType);

    private delegate int IntReturner();

    static void Main(string[] args)
    {
        List<byte> bodyBuilder = new List<byte>();
        bodyBuilder.Add(0xb8); // MOV EAX,
        bodyBuilder.AddRange(BitConverter.GetBytes(42)); // 42
        bodyBuilder.Add(0xc3);  // RET

        byte[] body = bodyBuilder.ToArray();

        IntPtr buf = IntPtr.Zero;

        try
        {
            // We VirtualAlloc body.Length bytes, with R/W access
            // Note that from what I've read, MEM_RESERVE is useless
            // if the first parameter is IntPtr.Zero
            buf = VirtualAlloc(IntPtr.Zero, (IntPtr)body.Length, MEM_COMMIT, PAGE_READWRITE);

            if (buf == IntPtr.Zero)
            {
                throw new Win32Exception();
            }

            // Copy our instructions in the buf
            Marshal.Copy(body, 0, buf, body.Length);

            // Change the access of the allocated memory from R/W to Execute
            uint oldProtection;
            bool result = VirtualProtect(buf, (IntPtr)body.Length, PAGE_EXECUTE, out oldProtection);

            if (!result)
            {
                throw new Win32Exception();
            }

            // Create a delegate to the "function"
            // Sadly we can't use Funct<int>
            var fun = (IntReturner)Marshal.GetDelegateForFunctionPointer(buf, typeof(IntReturner));

            Console.WriteLine(fun());
        }
        finally
        {
            if (buf != IntPtr.Zero)
            {
                // Free the allocated memory
                bool result = VirtualFree(buf, IntPtr.Zero, MEM_RELEASE);

                if (!result)
                {
                    throw new Win32Exception();
                }
            }
        }
    }
}
xanatos
  • 109,618
  • 12
  • 197
  • 280
  • 3
    Better to call `VirtualProtect` after the copy, to add the X bit and remove W. Since enforcing W^X seems to be good for security. – Ben Voigt Mar 13 '15 at 19:59
  • @BenVoigt I preferred to copy verbatim the example of code... But yes, it's normally better to do as you said. – xanatos Mar 13 '15 at 20:00
  • `popcnt eax, [esp + 4]` would be `F3 0F B8 44 24 04` by the way, so you can throw that in. `F3 0F B8 C1` for `popcnt eax, ecx` (for win64 calling conventions) – harold Mar 13 '15 at 20:19
  • @BenVoigt Now that I've used some `try... finally` and the `VirtualProtect` I feel more... clean :) – xanatos Mar 13 '15 at 20:20
  • Why is the IntReturner needed? – Erti-Chris Eelmaa Mar 13 '15 at 21:40
  • @Chris You need a delegate to "point" to the asm function. And it can't be one of the `Func<>` or `Action<>` because `GetDelegateForFunctionPointer` doesn't like them – xanatos Mar 13 '15 at 21:50
  • @xanatos: Yeah, but is there an actual reason why that wouldn't be possible technically if someone decided to make GetDelegateForFunctionPointer to like them? Seems like quite useful functionality. – Erti-Chris Eelmaa Mar 13 '15 at 21:52
  • @ChrisEelmaa I don't know why at Microsoft they decided that supporting generic delegates with GetDelegateForFunctionPointer was too much complex. – xanatos Mar 13 '15 at 21:58