An alternate suggestion: Don't use clock
. It's so underspecified it's nigh impossible to write code that will work fully portably, handling possible wraparound for 32 bit integer clock_t
, integer vs. floating point clock_t
, etc. (and by the time you write it, you've written so much ugliness you've lost whatever simplicity clock
provided).
Instead, use getrusage
. It's not perfect, and it might do a little more than you strictly need, but:
- The times it returns are guaranteed to operate relative to
0
(where the value returned by clock
at the beginning of a program could be anything)
- It lets you specify if you want to include stats from child processes you've waited on (
clock
either does or doesn't, in a non-portable fashion)
- It separates the user and system CPU times; you can use either one, or both, your choice
- Each time is expressed explicitly in terms of a pair of values, a
time_t
number of seconds, and a suseconds_t
number of additional microseconds. Since it doesn't try to encode a total microsecond count into a single time_t
/clock_t
(which might be 32 bits), wraparound can't occur until you've hit at least 68 years of CPU time (if you manage that, on a system with 32 bit time_t
, I want to know your IT folks; only way I can imagine hitting that is on a system with hundreds of cores, running weeks, and any such system would be 64 bit at this point).
- The parts of the result you need are specified by POSIX, so it's portable to just about everywhere but Windows (where you're stuck writing preprocessor controlled code to switch to
GetProcessTimes
when compiled for Windows)
Conveniently, since you're on POSIX systems (I think?), clock
is already expressed as microseconds, not real ticks (POSIX specifies that CLOCKS_PER_SEC
equals 1000000), so the values already align. You can just rewrite your function as:
#include <sys/time.h>
#include <sys/resource.h>
static inline u64 elapsed(const struct timeval *beg, const struct timeval *end)
{
return (end->tv_sec - beg->tv_sec) * 1000000ULL + (end->tv_usec - beg->tv_usec);
}
void f()
{
struct rusage beg, end;
// Not checking return codes, because only two documented failure cases are passing
// an unmapped memory address for the struct addr or an invalid who flag, neither of which
// we're doing, easily verified by inspection
getrusage(RUSAGE_SELF, &beg);
work();
getrusage(RUSAGE_SELF, &end);
accum_ticks += elapsed(&beg.ru_utime, &end.ru_utime);
// And if you want to include system time as well, add:
accum_ticks += elapsed(&beg.ru_stime, &end.ru_stime);
}
u64 elapsed_CPU_us()
{
return accum_ticks; // It's already stored natively in microseconds
}
On Linux 2.6.26+, you can replace RUSAGE_SELF
with RUSAGE_THREAD
to limit to the resources used solely by the calling thread alone, not just the calling process (which might help if other threads are doing unrelated work and you don't want their stats polluting yours), in exchange for less portability.
Yes, it's a little more work to compute the time (two additions/subtractions, one multiplications by a constant, doubled if you want both user and system time, where clock
in the simplest usage is a single subtraction), but:
- Handling
clock
wraparound adds more work (and branches work, which this code doesn't have; admittedly, it's a fairly predictable branch), narrowing the gap
- Integer multiplication is roughly as cheap as addition and subtraction on modern chips (the latest x86-64 chips perform integer multiply in a single clock cycle), so you're not adding orders of magnitude more work, and in exchange, you get more control, more guarantees, and greater portability
Note: You might see code using clock_gettime
with clock ID CLOCK_PROCESS_CPUTIME_ID
, which would simplify your code when you just want total CPU time, not split up by user vs. system, without all the other stuff getrusage
provides (perhaps it would be faster, simply by virtue of retrieving less data). Unfortunately, while clock_gettime
is guaranteed by POSIX, the CLOCK_PROCESS_CPUTIME_ID
clock ID is not, so you can't use it reliably on all POSIX systems (FreeBSD at least seems to lack it). All the parts of getrusage
we're relying on are fully standard, so it's safe.