2

Im trying get the elapsed time of my program. Actually i thought I should use yclock() from time.h. But it stays zero in all phases of the program although I'm adding 10^5 numbers(there must be some CPU time consumed). I already searched this problem and it seems like, people running Linux are having this issue only. I'm running Ubuntu 12.04LTS.

I'm going to compare AVX and SSE instructions, so using time_t is not really an option. Any hints?

Here is the code:

 //Dimension of Arrays
unsigned int N = 100000;
//Fill two arrays with random numbers
unsigned  int a[N];
clock_t start_of_programm = clock();
for(int i=0;i<N;i++){
    a[i] = i;
}
clock_t after_init_of_a = clock();
unsigned  int b[N];
for(int i=0;i<N;i++){
    b[i] = i;
}
clock_t after_init_of_b = clock();

//Add the two arrays with Standard
unsigned int out[N];
for(int i = 0; i < N; ++i)
    out[i] = a[i] + b[i];
clock_t after_add = clock();

cout  << "start_of_programm " << start_of_programm  << endl; // prints
cout  << "after_init_of_a " << after_init_of_a  << endl; // prints
cout  << "after_init_of_b " << after_init_of_b  << endl; // prints
cout  << "after_add " << after_add  << endl; // prints
cout  << endl << "CLOCKS_PER_SEC " << CLOCKS_PER_SEC << endl;

And the output of the console. I also used printf() with %d, with no difference.

start_of_programm 0
after_init_of_a 0
after_init_of_b 0
after_add 0

CLOCKS_PER_SEC 1000000
Appleshell
  • 7,088
  • 6
  • 47
  • 96
toebs
  • 389
  • 1
  • 2
  • 13
  • "Im trying to compute the time of my process running"... you mean elapsed time? `clock()` returns the amount of CPU time the process has used. – trojanfoe Sep 09 '13 at 10:54
  • true. im not a natural speaker. anyways, any suggestions for the solution of my problem? – toebs Sep 09 '13 at 10:57
  • What if you put two more `0`s on `N`? – trojanfoe Sep 09 '13 at 10:58
  • 1
    If you have access to C++11 (any recent compiler), try using `std::chrono` instead of C time functions. – Appleshell Sep 09 '13 at 10:58
  • @toebs, if you're going to compare AVX and SSE code then you're going to probably want to use 32 byte aligned memory as well as larger array sizes. In that case allocating your arrays on the stack won't work. I would use `_mm_malloc(sizeof(int)*N, 32)`. – Z boson Sep 09 '13 at 13:31
  • @redrum: yes this would be the next stept, but first I had to figure out, how to measure the elapsed time. I still need to compare everything with non SSE,AVX instructions. – toebs Sep 09 '13 at 15:29
  • @toebs, see my answer for computing the measured elapsed time. I don't think you'll find a simpler answer (and one that works on windows and MSVC as well). – Z boson Sep 09 '13 at 15:32
  • @redrum: everything is going to run on linux only, so I dont have to care about windows :) – toebs Sep 09 '13 at 15:42

4 Answers4

5

clock does indeed return the CPU time used, but the granularity is in the order of 10Hz. So if your code doesn't take more than 100ms, you will get zero. And unless it's significantly longer than 100ms, you won't get a very accurate value, because it your error margin will be around 100ms.

So, increasing N or using a different method to measure time would be your choices. std::chrono will most likely produce a more accurate timing (but it will measure "wall-time", not CPU-time).

timespec t1, t2; 
clock_gettime(CLOCK_REALTIME, &t1); 
... do stuff ... 
clock_gettime(CLOCK_REALTIME, &t2); 
double t = timespec_diff(t2, t1);

double timespec_diff(timespec t2, timespec t1)
{
    double d1 = t1.tv_sec + t1.tv_nsec / 1000000000.0;
    double d2 = t2.tv_sec + t2.tv_nsec / 1000000000.0;

    return d2 - d1;
}
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
2

The simplest way to get the time is to just use a stub function from OpenMP. This will work on MSVC, GCC, and ICC. With MSVC you don't even need to enable OpenMP. With ICC you can link just the stubs if you like -openmp-stubs. With GCC you have to use -fopenmp.

#include <omp.h>

double dtime;
dtime = omp_get_wtime();
foo();
dtime = omp_get_wtime() - dtime;
printf("time %f\n", dtime);
Community
  • 1
  • 1
Z boson
  • 32,619
  • 11
  • 123
  • 226
  • Seems to work at least. Can you explain where to add -openmp-stubs in Eclipse, when I use ICC? I added it as Linker and Compiler options, but it throws an error, that the function is unknown. For gcc it works – toebs Sep 09 '13 at 15:40
  • You can use `-fopenmp` or `-openmp` with ICC as well just like GCC. I have not used Eclipse (I use QtCreator or just a text editor and shell). What are you using to make a Makefile? – Z boson Sep 09 '13 at 15:43
  • Here is the link showing the [-openmp-stubs](http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/copts/common_options/option_openmp_stubs.htm) option for ICC. – Z boson Sep 09 '13 at 15:50
1

First, compiler is very likely to optimize your code. Check your compiler's optimization option.

Since array including out[], a[], b[] are not used by the successive code, and no value from out[], a[], b[] would be output, the compiler is to optimize code block as follows like never execute at all:

for(int i=0;i<=N;i++){
    a[i] = i;
}

for(int i=0;i<=N;i++){
    b[i] = i;
}

for(int i = 0; i < N; ++i)
    out[i] = a[i] + b[i];

Since clock() function returns CPU time, the above code consume almost no time after optimization.

And one more thing, set N a bigger value. 100000 is too small for a performance test, nowadays computer runs very fast with o(n) code at 100000 scale.

unsigned int N = 10000000;
lulyon
  • 6,707
  • 7
  • 32
  • 49
  • sadly, this was not the solution. I compiled with no optimizations anyways, but even if I eg. print "out[]" the elapsed time is still 0. – toebs Sep 09 '13 at 15:34
0

Add this to the end of the code

int sum = 0;
for(int i = 0; i<N; i++)
    sum += out[i];
cout << sum;

Then you will see the times.

Since you dont use a[], b[], out[] it ignores corresponding for loops. This is because of optimization of the compiler.

Also, to see the exact time it takes use debug mode instead of release, then you will be able to see the time it takes.

smttsp
  • 4,011
  • 3
  • 33
  • 62
  • sadly, this was not the solution. I compiled with no optimizations anyways, but even if I eg. print "out[]" the elapsed time is still 0. I'm still curios why the granulartiy is 10Hz, everywhere you look , clock is recomended to measure the elapsed time :\ – toebs Sep 09 '13 at 15:35
  • did you increase `N`, because it is too small.`N=10^8` is ok for test. Also you can do a bit more complicated things in the code, e.g `int tot = 0; for(i=1:N) tot++; tot /=2;`. I dont know about granularity :( – smttsp Sep 10 '13 at 05:28