4

I want to calculate the context switch time and I am thinking to use mutex and conditional variables to signal between 2 threads so that only one thread runs at a time. I can use CLOCK_MONOTONIC to measure the entire execution time and CLOCK_THREAD_CPUTIME_ID to measure how long each thread runs.
Then the context switch time is the (total_time - thread_1_time - thread_2_time). To get a more accurate result, I can just loop over it and take the average.

Is this a correct way to approximate the context switch time? I cant think of anything that might go wrong but I am getting answers that are under 1 nanosecond..

I forgot to mention that the more time I loop it over and take the average, the smaller results I get.

Edit

here is a snippet of the code that I have

    typedef struct
    {
      struct timespec start;
      struct timespec end;
    }thread_time;

    ...


    // each thread function looks similar like this
    void* thread_1_func(void* time)
    {
       thread_time* thread_time = (thread_time*) time;

       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->start)); 
       for(x = 0; x < loop; ++x)
       {
         //where it switches to another thread
       }
       clock_gettime(CLOCK_THREAD_CPUTIME_ID, &(thread_time->end));

       return NULL;
   };

   void* thread_2_func(void* time)
   {
      //similar as above
   }

   int main()
   {
      ...
      pthread_t thread_1;
      pthread_t thread_2;

      thread_time thread_1_time;
      thread_time thread_2_time;

      struct timespec start, end;

      // stamps the start time 
      clock_gettime(CLOCK_MONOTONIC, &start);

      // create two threads with the time structs as the arguments 
      pthread_create(&thread_1, NULL, &thread_1_func, (void*) &thread_1_time);
      pthread_create(&thread_2, NULL, &thread_2_func, (void*) &thread_2_time); 
      // waits for the two threads to terminate 
      pthread_join(thread_1, NULL);
      pthread_join(thread_2, NULL);

      // stamps the end time 
      clock_gettime(CLOCK_MONOTONIC, &end);

      // then I calculate the difference between between total execution time and the total execution time of two different threads..
   }
GalaxyVintage
  • 656
  • 1
  • 11
  • 20

2 Answers2

2

First of all, using CLOCK_THREAD_CPUTIME_ID is probably very wrong; this clock will give the time spent in that thread, in user mode. However the context switch does not happen in user mode, You'd want to use another clock. Also, on multiprocessing systems the clocks can give different values from processor to another! Thus I suggest you use CLOCK_REALTIME or CLOCK_MONOTONIC instead. However be warned that even if you read either of these twice in rapid succession, the timestamps usually will tens of nanoseconds apart already.


As for context switches - tthere are many kinds of context switches. The fastest approach is to switch from one thread to another entirely in software. This just means that you push the old registers on stack, set task switched flag so that SSE/FP registers will be lazily saved, save stack pointer, load new stack pointer and return from that function - since the other thread had done the same, the return from that function happens in another thread.

This thread to thread switch is quite fast, its overhead is about the same as for any system call. Switching from one process to another is much slower: this is because the user-space page tables must be flushed and switched by setting the CR0 register; this causes misses in TLB, which maps virtual addresses to physical ones.


However the <1 ns context switch/system call overhead does not really seem plausible - it is very probable that there is either hyperthreading or 2 CPU cores here, so I suggest that you set the CPU affinity on that process so that Linux only ever runs it on say the first CPU core:

#include <sched.h>

cpu_set_t  mask;
CPU_ZERO(&mask);
CPU_SET(0, &mask);
result = sched_setaffinity(0, sizeof(mask), &mask);

Then you should be pretty sure that the time you're measuring comes from a real context switch. Also, to measure the time for switching floating point / SSE stacks (this happens lazily), you should have some floating point variables and do calculations on them prior to context switch, then add say .1 to some volatile floating point variable after the context switch to see if it has an effect on the switching time.

  • 1
    '<1 ns context switch/system call overhead does not really seem plausible' lol yes, I would have gone with 'impossible'. – Martin James Mar 18 '16 at 08:24
  • I still don't get why `CLOCK_THREAD_CPUTIME_ID` doesnt work. If i call `clock_gettime(CLOCK_THREAD_CPUTIME_ID, &start)` at the beginning of a thread and `clock_gettime(CLOCK_THREAD_CPUTIME_ID, &end)`at the end of the thread, shouldn't the difference be the total time difference? – GalaxyVintage Mar 18 '16 at 18:45
  • I have tried setting the cpu to 0 and the result I am getting is still extremely small. I have included a snippet of my code and I am wondering if there is anything wrong..... – GalaxyVintage Mar 18 '16 at 23:48
1

This is not straight forward but as usual someone has already done a lot of work on this. (I'm not including the source here because I cannot see any License mentioned)

https://github.com/tsuna/contextswitch/blob/master/timetctxsw.c

If you copy that file to a linux machine as (context_switch_time.c) you can compile and run it using this

gcc -D_GNU_SOURCE -Wall -O3 -std=c11 -lpthread context_switch_time.c
./a.out

I got the following result on a small VM

2000000  thread context switches in 2178645536ns (1089.3ns/ctxsw)

This question has come up before... for Linux you can find some material here.

Write a C program to measure time spent in context switch in Linux OS

Note, while the user was running the test in the above link they were also hammering the machine with games and compiling which is why the context switches were taking a long time. Some more info here...

how can you measure the time spent in a context switch under java platform

Community
  • 1
  • 1
Harry
  • 11,298
  • 1
  • 29
  • 43
  • 3
    The first link has an accepted answer with 23 votes. It suggests times in the order of 10-20ms. That is just ridiculously high, and typical of the rubbish that seems to infest the multithreading tag:( – Martin James Mar 18 '16 at 08:22