0

For statistical purposes I want to accumulate the whole CPU-time used for a function of a program, in microseconds. It must work in two systems, one where sizeof(clock_t) = 8 (RedHat) and another one where sizeof(clock_t) = 4 (AIX). In both machines clock_t is a signed integer type and CLOCKS_PER_SEC = 1000000 (= one microsecond, but I don't do such assumption in code and use the macro instead).

What I have is equivalent to something like this (but encapsulated in some fancy classes):

typedef unsigned long long u64;
u64 accum_ticks = 0;

void f()
{
   clock_t beg = clock();
   work();
   clock_t end = clock();

   accum_ticks += (u64)(end - beg); // (1)
}

u64 elapsed_CPU_us()
{
   return accum_tick * 1e+6 / CLOCKS_PER_SEC;
}

But, in the 32-bit AIX machine where clock_t is an int, it will overflow after 35m47s. Suppose that in some call beg equals 35m43s since the program started, and work() takes 10 CPU-seconds, causing end to overflow. Can I trust line (1) for this and subsequental calls to f() from now on? f() is guaranteed to never take more than 35 minutes of execution, of course.

In case I can't trust line (1) at all even in my particular machine, what alternatives do I have that doesn't imply importing any third-party library? (I can't copy-paste libraries to the system and I can't use <chrono> because in our AIX machines it isn't available).

NOTE: I can use kernel headers and the precision I need is in microseconds.

ABu
  • 10,423
  • 6
  • 52
  • 103
  • 1
    Don't tag questions that use notations that are only valid in C++ with the C tag too. – Jonathan Leffler Nov 16 '22 at 17:59
  • If your code has [Undefined Behaviour](https://en.cppreference.com/w/cpp/language/ub) then *all* bets are off and you cannot assume anything. – Jesper Juhl Nov 16 '22 at 18:01
  • An underflow/overflow on a (signed) integer typically causes undefined behaviour, in addition. An overflow/underflow on an unsigned integer is well-defined though. You can make this reliable by converting to an unsigned type first. Also, you want to have unit tests on the different values that must be handled correctly. – Ulrich Eckhardt Nov 16 '22 at 18:01
  • @JonathanLeffler fair enough. I have removed references to `std::` now. Because `unsigned long long` is valid since `C11` right? – ABu Nov 16 '22 at 18:01
  • It isn't just a matter of whether the `work` takes more than 35 minutes but if `clock` has wrapped during it. If `end < beg` you'll need to compensate. – Weather Vane Nov 16 '22 at 18:02
  • 1
    The `unsigned long long` type is a part of C since C99, but yes, the code is now valid C (and maybe valid C++ if you've got appropriate an `using namespace` in scope). OTOH, you should still choose one of the two languages unless your question is about the interworking of the two languages. – Jonathan Leffler Nov 16 '22 at 18:02
  • @WeatherVane but, "WHERE" to cast or how to compensate? Because, in case `clock` overflows "during it", what would `clock` return to me on the `end = clock()` line? `-1`, `-2^31`, will it start counting from `0`, some undefined value? – ABu Nov 16 '22 at 18:05
  • Note that your cast comes too late: the arithmetic is done using a signed type, and if overflow occurs, you've got undefined behaviour because of signed arithmetic overflow. – Jonathan Leffler Nov 16 '22 at 18:05
  • @JonathanLeffler maybe there's something I don't get. I mean, the overflow also happens INSIDE clock. If beg = `2^31 - 100`, and `work()` takes 200 clocks, what is the value of `end = clock()`? – ABu Nov 16 '22 at 18:11
  • Maybe you can use (e.g.) `setitimer/getitimer` with `ITIMER_VIRTUAL`. Set an large/infinite timeout value. Call the function. Then, use `getitimer` and look at the remaining time. – Craig Estey Nov 16 '22 at 18:12
  • Note that `1e+6` is a `double` value, so the `return` is approximately equivalent to: `return (unsigned long long)(((double)accum_tick * 1e+6) / (double)CLOCKS_PER_SEC)));`. I also note that `accum_ticks += unsigned long long(end - beg);` is C++ and not C — C-style casts are enclosed in parentheses, as in the code I show for the `return` statement. – Jonathan Leffler Nov 16 '22 at 18:12
  • `unsigned long elapsed = (unsigned long)end - (unsigned long)beg;` should do it. If the period is greater than 35 minutes there is nothing that can be done. – Weather Vane Nov 16 '22 at 18:12
  • No: there's no overflow in `clock()` — it returns a value. It just so happens that if you reach the limit, one return value could be large and the next could be small, but there's no overflow in the function (that you can spot). Your overflow (potentially) happens in the difference operation where you add more ticks to `accum_ticks`. – Jonathan Leffler Nov 16 '22 at 18:14
  • 2
    The clock counter does not overflow: it wraps as if it were unsigned, and continues counting. – Weather Vane Nov 16 '22 at 18:18
  • @WeatherVane Ok. That clarification fixes my understanding. – ABu Nov 16 '22 at 18:19
  • 1
    Suppose `beg = 0x7fffffff` and `end = 0x80000003`, you get `80000003 - 0x7fffffff` which is `4`. Provided you work with an unsigned `elapsed` *variable* to ensure the difference is correct. Or suppose `beg = 0xffffffff` and `end = 0x0000003`, you get `00000003 - 0xffffffff` which is `4`. – Weather Vane Nov 16 '22 at 18:22
  • Related: https://stackoverflow.com/questions/31967370/is-detecting-unsigned-wraparound-via-cast-to-signed-undefined-behavior – dbush Nov 16 '22 at 18:32
  • 1
    @Peregring-lk: Is there a reason to specifically use `clock`? POSIX provides `getrusage`, which has a much better specification (`clock` doesn't specify whether waited for child process times are included, doesn't specify whether `clock_t` is even integer or floating point let alone the size, etc.). `getrusage` lets you specify whether or not to include resources used by child processes, breaks out user CPU and system CPU time separately, and specifies that both user and system CPU times will be expressed as a struct that combines a `time_t` seconds count with an integer microseconds count. – ShadowRanger Nov 16 '22 at 18:43
  • @ShadowRanger maybe is what I need. I'll take a look. – ABu Nov 16 '22 at 18:45
  • 1
    Sadly, even with `struct timeval` composed of a `time_t tv_sec` and `suseconds_t tv_usec`, neither type's size is strictly specified, but at least they're both integers, and since they're both measuring CPU time for the process starting from zero, not an arbitrary start point, they won't overflow even on a 32 bit system unless your program uses ~68 years worth of CPU time. The computation is simple at least; `(end.ru_utime.tv_sec - beg.ru_utime.tv_sec) * 1000000ULL + end.ru_utime.tv_usec - beg.ru_utime.tv_usec` gets you the number of user CPU microseconds elapsed. – ShadowRanger Nov 16 '22 at 18:46
  • No need to handle wraparound, because `end`'s `tv_sec` is always `>=` to `beg`'s `tv_sec`. When they're equal, `end`'s `tv_usec` is guaranteed to be `>=` to `beg`'s (so no wraparound there), and if `end`'s `tv_usec` isn't `>=` to `beg`'s, it means `end.ru_utime.tv_sec` is strictly greater than `beg.ru_utime.tv_sec`, so the amount subtracted still can't reduce any value below zero. – ShadowRanger Nov 16 '22 at 18:54
  • 1
    If your AIX system is 5.3 or newer, use 64-bit instead of 32. Also you don't have invent types like u64, use `stdint.h` – Lorinczy Zsigmond Nov 16 '22 at 20:31

1 Answers1

2

An alternate suggestion: Don't use clock. It's so underspecified it's nigh impossible to write code that will work fully portably, handling possible wraparound for 32 bit integer clock_t, integer vs. floating point clock_t, etc. (and by the time you write it, you've written so much ugliness you've lost whatever simplicity clock provided).

Instead, use getrusage. It's not perfect, and it might do a little more than you strictly need, but:

  1. The times it returns are guaranteed to operate relative to 0 (where the value returned by clock at the beginning of a program could be anything)
  2. It lets you specify if you want to include stats from child processes you've waited on (clock either does or doesn't, in a non-portable fashion)
  3. It separates the user and system CPU times; you can use either one, or both, your choice
  4. Each time is expressed explicitly in terms of a pair of values, a time_t number of seconds, and a suseconds_t number of additional microseconds. Since it doesn't try to encode a total microsecond count into a single time_t/clock_t (which might be 32 bits), wraparound can't occur until you've hit at least 68 years of CPU time (if you manage that, on a system with 32 bit time_t, I want to know your IT folks; only way I can imagine hitting that is on a system with hundreds of cores, running weeks, and any such system would be 64 bit at this point).
  5. The parts of the result you need are specified by POSIX, so it's portable to just about everywhere but Windows (where you're stuck writing preprocessor controlled code to switch to GetProcessTimes when compiled for Windows)

Conveniently, since you're on POSIX systems (I think?), clock is already expressed as microseconds, not real ticks (POSIX specifies that CLOCKS_PER_SEC equals 1000000), so the values already align. You can just rewrite your function as:

#include <sys/time.h>
#include <sys/resource.h>

static inline u64 elapsed(const struct timeval *beg, const struct timeval *end)
{
    return (end->tv_sec - beg->tv_sec) * 1000000ULL + (end->tv_usec - beg->tv_usec);
}

void f()
{
   struct rusage beg, end;
   // Not checking return codes, because only two documented failure cases are passing
   // an unmapped memory address for the struct addr or an invalid who flag, neither of which
   // we're doing, easily verified by inspection
   getrusage(RUSAGE_SELF, &beg);
   work();
   getrusage(RUSAGE_SELF, &end);

   accum_ticks += elapsed(&beg.ru_utime, &end.ru_utime);
   // And if you want to include system time as well, add:
   accum_ticks += elapsed(&beg.ru_stime, &end.ru_stime);
}

u64 elapsed_CPU_us()
{
   return accum_ticks; // It's already stored natively in microseconds
}

On Linux 2.6.26+, you can replace RUSAGE_SELF with RUSAGE_THREAD to limit to the resources used solely by the calling thread alone, not just the calling process (which might help if other threads are doing unrelated work and you don't want their stats polluting yours), in exchange for less portability.

Yes, it's a little more work to compute the time (two additions/subtractions, one multiplications by a constant, doubled if you want both user and system time, where clock in the simplest usage is a single subtraction), but:

  1. Handling clock wraparound adds more work (and branches work, which this code doesn't have; admittedly, it's a fairly predictable branch), narrowing the gap
  2. Integer multiplication is roughly as cheap as addition and subtraction on modern chips (the latest x86-64 chips perform integer multiply in a single clock cycle), so you're not adding orders of magnitude more work, and in exchange, you get more control, more guarantees, and greater portability

Note: You might see code using clock_gettime with clock ID CLOCK_PROCESS_CPUTIME_ID, which would simplify your code when you just want total CPU time, not split up by user vs. system, without all the other stuff getrusage provides (perhaps it would be faster, simply by virtue of retrieving less data). Unfortunately, while clock_gettime is guaranteed by POSIX, the CLOCK_PROCESS_CPUTIME_ID clock ID is not, so you can't use it reliably on all POSIX systems (FreeBSD at least seems to lack it). All the parts of getrusage we're relying on are fully standard, so it's safe.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Minor: `(end->tv_sec - beg->tv_sec) * 1000000ULL + end->tv_usec - beg->tv_usec` could use narrower, perhaps faster math, with `(end->tv_sec - beg->tv_sec) * 1000000ULL + (end->tv_usec - beg->tv_usec)` – chux - Reinstate Monica Nov 16 '22 at 19:52
  • @chux-ReinstateMonica: I avoided doing that just because I didn't want to verify whether, when the computation comes up negative (when at least a second has passed, the `end` microseconds might be smaller), the behavior for `unsigned long long value + negative signed int value` would be 100% portable. It might be safe, but working purely with positive values removes my doubts; in practice, at least on x86-64, the performance for 64 bit addition/subtraction is not meaningfully distinct from 32 bit. Do you know off-hand if the standard guarantees it's safe? I can never remember these details. – ShadowRanger Nov 16 '22 at 21:07
  • `(end->tv_usec - beg->tv_usec)` is safe as long as `.tv_usec`, a signed integer type, is in the [0...1000000000) range. – chux - Reinstate Monica Nov 16 '22 at 22:17
  • @chux-ReinstateMonica: Yeah, that part is definitely safe. The question is whether, if the result of that computation is negative (because `end->tv_usec` is less than `beg->tv_usec`), is it safe to add that smaller negative value to the larger unsigned value. I *think* it is (for matched sizes, [it is](https://stackoverflow.com/q/7544123/364696)) but the extra complication of it needing to both become unsigned *and* promote from 32 to 64 bits made me a *little* leery. – ShadowRanger Nov 16 '22 at 22:35
  • Basically, I'm not sure if [this program](https://tio.run/##RU5LCsIwFNznFI@K0I@19VOrxHqCbsWNmzRNYyCmJU1BEK9ujFXoLGYew8zwaNfFnFJrZ0JROdQMjr2pRbu8nRASysCdCOUH6InAYVC94IrVIFvFf0SggNV6s812@f6QTte5LPHY@Te@W5XLxhlGo99pZzW@N5dyuCpv4ZYiqAIMSQgZhMkY0swMWkGK0cvaN20k4b2NL0RKx@xhNJnU/V3QKFrlHw "C++ (gcc) – Try It Online") is guaranteed to *always* print `12345678901234567885` (the result of adding `-5` to `12345678901234567890ULL` ) for all C and C++ standards and all common compilers thereof. – ShadowRanger Nov 16 '22 at 22:36
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/249679/discussion-between-chux-reinstate-monica-and-shadowranger). – chux - Reinstate Monica Nov 16 '22 at 22:37
  • @chux-ReinstateMonica: Cool, don't mind me, always being paranoid about anything but nice, predictable, purely unsigned integers. :-) I've tweaked the answer. – ShadowRanger Nov 16 '22 at 22:37