5

I'm looking for a way to measure the cpu cycles a function call on a thread takes.

Example pseudo code:

void HostFunction()
{
     var startTick = CurrentThread.CurrentTick;  //does not exist

     ChildFunction();

     var endTick = CurrentThread.CurrentTick;  //does not exist

     var childFunctionCost = endTick - startTick;
}

void ChildFunction() 
{
    //Do whatever...

    Thread.Sleep(3000);

    //Do some more...
}

I don't want to use a Stopwatch or some other time measurement, because it would include any time that the thread is sleeping, which I do not want to measure. I only want to measure real work.

This measurement needs to work at runtime, as in my pseudo code, as the results are used to determine if the child function should be allowed to continue to run (my real case is a plugin-type architecture), so a profiling tool won't help me.

Nathan A
  • 11,059
  • 4
  • 47
  • 63
  • 3
    consider this solution (GetThreadTimes): http://stackoverflow.com/questions/26472936/why-does-getthreadtimes-return/26475906#26475906 (not the code, just the calls) The code is in serious need of help, but the breakdown of the results might be of interest. – Jeff Oct 28 '14 at 21:44
  • Is there anything that prevents the plug-in from spawning additional threads and/or using the threadpool? If not then basing scheduling on current thread times will not yield the desired result. – Brian Rasmussen Oct 28 '14 at 22:09
  • @BrianRasmussen, yes I am using Mono.Cecil to scan all the Type Refs, and if any namespaces/types that manage threads are used, it's automatically excluded. – Nathan A Oct 28 '14 at 22:13
  • Thank you all for your answers. I think I'll end up going with the solution by @HansPassant, as it appears to return the value in cycle units (not time units). For my case, my plugin spec requires a fixed number of allowed cycles for the child function call, regardless of the speed of the CPU, so based on my limited understanding on this subject, I'm figuring that cycles would be the better way to do that. – Nathan A Oct 28 '14 at 22:29

2 Answers2

13

You can pinvoke QueryThreadCycleTime(). Check the link for details.

Some sample code:

using System;
using System.Diagnostics;
using System.Runtime.InteropServices;

class Program {
    static void Main(string[] args) {
        ulong start, end;
        start = NativeMethods.GetThreadCycles();
        System.Threading.Thread.Sleep(1000);
        end = NativeMethods.GetThreadCycles();
        ulong cycles = end - start;
        Debug.Assert(cycles < 200000);
    }

    static class NativeMethods {
        public static ulong GetThreadCycles() {
            ulong cycles;
            if (!QueryThreadCycleTime(PseudoHandle, out cycles))
                throw new System.ComponentModel.Win32Exception();
            return cycles;
        }
        [DllImport("kernel32.dll", SetLastError = true)]
        private static extern bool QueryThreadCycleTime(IntPtr hThread, out ulong cycles);
        private static readonly IntPtr PseudoHandle = (IntPtr)(-2);

    }
}
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Do you know if the value returned by this function is updated in (near) real-time or does it suffer the same issue that GetThreadTimes has namely that it only gets updated on a clock interrupt? – 500 - Internal Server Error Oct 28 '14 at 22:21
  • Easy to try for yourself, just call it twice. Yes. – Hans Passant Oct 28 '14 at 22:25
  • Good point. Sorry for being lazy. It does indeed get updated on every call, as far as I can tell. – 500 - Internal Server Error Oct 28 '14 at 22:41
  • 1
    @HansPassant Just out of curiosity, how did you come up with the PseudoHandle? Is there a documentation of that value? – Nathan A Oct 30 '14 at 21:27
  • No, it is secret knowledge. You can ask a question about it :) – Hans Passant Oct 30 '14 at 21:33
  • @HansPassant You are quite cruel... It took me at least 20 minutes to discover what *-2* is... It is quite hidden :-) and since hans was cruel, I'll tell everyone that *-2* is the what `GetCurrentThread()` always returns (see for example http://blogs.msdn.com/b/oldnewthing/archive/2014/10/15/10564700.aspx where it is explained in the comments) – xanatos Apr 15 '15 at 10:39
2

Based on the comment I provided above, consider the following code:

using System;
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Threading;

namespace FunctionTiming
{
    class Program
    {
        private static Thread _thread;
        private static IntPtr _threadHandle;

        static void Main(string[] args)
        {
            _thread = new Thread(new ThreadStart(Program.TargetFunction));
            _thread.Start();
            _thread.Join();

            System.Runtime.InteropServices.ComTypes.FILETIME start, end, rawKernelTime, rawUserTime;
            bool result = GetThreadTimes(_threadHandle, out start, out end, out rawKernelTime, out rawUserTime);
            Debug.Assert(result);

            ulong uLow = (ulong)rawKernelTime.dwLowDateTime;
            ulong uHigh = (uint)rawKernelTime.dwHighDateTime;
            uHigh = uHigh << 32;
            long kernelTime = (long)(uHigh | uLow);

            uLow = (ulong)rawUserTime.dwLowDateTime;
            uHigh = (uint)rawUserTime.dwHighDateTime;
            uHigh = uHigh << 32;
            long userTime = (long)(uHigh | uLow);

            Debug.WriteLine("Kernel time: " + kernelTime);
            Debug.WriteLine("User time: " + userTime);
            Debug.WriteLine("Combined raw execution time: " + (kernelTime + userTime));

            long functionTime = (kernelTime + userTime) / 10000;
            Debug.WriteLine("Funciton Time: " + functionTime + " milliseconds");
        }

        static void TargetFunction()
        {
            IntPtr processHandle = GetCurrentProcess();
            bool result = DuplicateHandle(processHandle, GetCurrentThread(), processHandle, out _threadHandle, 0, false, (uint)DuplicateOptions.DUPLICATE_SAME_ACCESS);

            double value = 9876543.0d;
            for (int i = 0; i < 100000; ++i)
                value = Math.Cos(value);

            Thread.Sleep(3000);

            value = 9876543.0d;
            for (int i = 0; i < 100000; ++i)
                value = Math.Cos(value);
        }

        [DllImport("kernel32.dll", SetLastError = true)]
        static extern bool GetThreadTimes(IntPtr hThread,
            out System.Runtime.InteropServices.ComTypes.FILETIME lpCreationTime, out System.Runtime.InteropServices.ComTypes.FILETIME lpExitTime,
            out System.Runtime.InteropServices.ComTypes.FILETIME lpKernelTime, out System.Runtime.InteropServices.ComTypes.FILETIME lpUserTime);

        [DllImport("kernel32.dll")]
        private static extern IntPtr GetCurrentThread();

        [DllImport("kernel32.dll", SetLastError = true)]
        [return: MarshalAs(UnmanagedType.Bool)]
        static extern bool DuplicateHandle(IntPtr hSourceProcessHandle,
            IntPtr hSourceHandle, IntPtr hTargetProcessHandle, out IntPtr lpTargetHandle,
            uint dwDesiredAccess, [MarshalAs(UnmanagedType.Bool)] bool bInheritHandle, uint dwOptions);

        [Flags]
        public enum DuplicateOptions : uint
        {
            DUPLICATE_CLOSE_SOURCE = (0x00000001),// Closes the source handle. This occurs regardless of any error status returned.
            DUPLICATE_SAME_ACCESS = (0x00000002), //Ignores the dwDesiredAccess parameter. The duplicate handle has the same access as the source handle.
        }

        [DllImport("kernel32.dll")]
        static extern IntPtr GetCurrentProcess();
    }
}

which produces the following result (on my older machine):

Kernel time: 0
User time: 156250
Combined raw execution time: 156250
Function time: 15 milliseconds

You can clearly see that the 3 seconds of sleeping is not being included. Hope this helps.

Jeff
  • 2,495
  • 18
  • 38
  • 2
    You can't get better resolution than the clock tick interrupt rate. Default is 64 interrupts/second. Or 15.625 msec, as you can tell. – Hans Passant Oct 28 '14 at 22:15
  • @Jeff, Thanks for your answer. Compared to Hans' answer though, it looks rather complex. Is there any benefit to using this one compared to his? Hans, same question. – Nathan A Oct 28 '14 at 22:17
  • @NathanA - Both strategies can provide sufficient results. And the code above is verbose, and can be abstracted how you like. The intent was simply to provide straight forward code that will compile that you can test with. – Jeff Oct 28 '14 at 22:21
  • @NathanA - After testing with the code Hans provided, I agree. It is a better solution, and I am glad that he provided it. – Jeff Oct 28 '14 at 22:36