I need to measure the time difference between allocating normal CPU memory with new
and a call to cudaMallocManaged
. We are working with unified memory and are trying to figure out the trade-offs of switching things to cudaMallocManaged
. (The kernels seem to run a lot slower, likely due to a lack of caching or something.)
Anyway, I am not sure the best way to time these allocations. Would one of boost's process_real_cpu_clock
, process_user_cpu_clock
, or process_system_cpu_clock
give me the best results? Or should I just use the regular system time call in C++11? Or should I use the cudaEvent stuff for timing?
I figure that I shouldn't use the cuda events, because they are for timing GPU processes and would not be acurate for timing cpu calls (correct me if I am wrong there.) If I could use the cudaEvents on just the mallocManaged one, what would be most accurate to compare against when timing the new
call? I just don't know enough about memory allocation and timing. Everything I read seems to just make me more confused due to boost's and nvidia's shoddy documentation.