CUDA: CUtil timer - confusion on elapsed time

Question

When I assess my program, I saw that at some point I get up to 100msec time lapse. I have searched every operation, but individually no operation was taking this time. Then I have noticed that wherever I do place cudaThreadSynchronize call, the first call takes 100 msec. Then I have written such an example below. When cudaThreadSynchronize is called at the first line, the elapsed time value at the end is found less than 1 msec. But if it is not called then it takes 110msec on average.

int main(int argc, char **argv)
{
    cudaThreadSynchronize(); //Comment out it then get 110msec as elapsed time..

    unsigned int timer;
    cutCreateTimer(&timer);
    cutStartTimer(timer);

    float *data;
    CUDA_SAFE_CALL(cudaMalloc(&data, sizeof(float) * 1024));

    cutStopTimer(timer);
    printf("CUT Elapsed: %.3f\n", cutGetTimerValue(timer));

    cutDeleteTimer(timer);

    return EXIT_SUCCESS;
}

I think cudaThreadSynchronize() at the start handles the initialization of the CUDA library. Is it the correct way to fully initialize the kernel, so it will not affect other operations' time assessment? Is it enough, and correct to call cudaThreadSynchronize at the start, or is there any correct way..

possible duplicate of [Linking with 3rd party CUDA libraries slows down cudaMalloc](http://stackoverflow.com/questions/11664627/linking-with-3rd-party-cuda-libraries-slows-down-cudamalloc) — talonmies, Jul 29 '12 at 05:51

score 1 · Accepted Answer · answered Jul 28 '12 at 23:23

1

In order to use CUDA, a 'CUDA context' must be first created on the GPU, this takes around 70-100ms. In your example cudaThreadSynchronize(); is making the context. A context is created only once for your application. When doing timing analysis I also do a dummy memory copy to create a context (as you have done above using cudaThreadSynchronize();).

answered Jul 28 '12 at 23:23

akk

259
2
8

2

The "classic" way of forcing the creation of the CUDA context prior to a timed section of code is to call cudaFree(0). – njuffa Jul 29 '12 at 00:06

CUDA: CUtil timer - confusion on elapsed time

1 Answers1

Linked