0

I am new to C++ and I need to measure the total time for different parts of a recursive function. A simple example to show where I get so far is:

#include <iostream>
#include <unistd.h>
#include <chrono>
using namespace std;
using namespace std::chrono;

int recursive(int);
void foo();
void bar();

int main() {
    int n = 5;  // this value is known only at runtime
    int result = recursive(n);
    return 0;
}

int recursive(int n) {
    auto start = high_resolution_clock::now();
    if (n > 1) { recursive(n - 1); n = n - 1; }
    auto stop = high_resolution_clock::now();
    auto duration_recursive = duration_cast<microseconds>(stop - start); 

    cout << "time in recursive: " << duration_recursive.count() << endl;
    //
    // .. calls to other functions and computations parts I don't want to time
    //
    start = high_resolution_clock::now();
    foo();
    stop = high_resolution_clock::now();
    auto duration_foo = duration_cast<seconds>(stop - start);
    cout << "time in foo: " << duration_foo.count() << endl;
    //
    // .. calls to other functions and computations parts I don't want to time
    //
    start = high_resolution_clock::now();
    bar();
    stop = high_resolution_clock::now();
    auto duration_bar = duration_cast<seconds>(stop - start);
    cout << "time in bar: " << duration_bar.count() << endl;

    return 0;
}

void foo() { // a complex function
sleep(1);
}

void bar() { // another complex function
sleep(2);
}

I want the total time for each of the functions, for instance, for foo() it is 5 seconds, while now I always get 1 second. The number of iterations is known only at runtime (n=5 here is fixed just for simplicity).

To compute the total time for each of the functions I did try replacing the type above by using static and accumulate the results but didn't work.

  • 1
    Sounds like a job for external profiling tools like valgrind – perivesta Jul 14 '22 at 10:22
  • using static local variables should work (you will only be able to measure the total time once though). Can you show the code using `static`? – 463035818_is_not_an_ai Jul 14 '22 at 10:23
  • 1
    Suggestion: [Don't use `high_resolution_clock`](https://stackoverflow.com/a/37440647/7582247) - use `steady_clock`. – Ted Lyngmo Jul 14 '22 at 10:26
  • Use what ever profiling tool your tool-chain supports. – Richard Critten Jul 14 '22 at 10:26
  • The same time gets counted more than once - in the third recursion, the time is in recursive, and it's also in recursive, and it's also in foo. So how do you want the measurement to work? – user253751 Jul 14 '22 at 10:37
  • 1
    @perivesta: `valgrind` is a memory profiler. For execution *speed*, it runs your machine code under an interpreter, so it's not realistic at all in terms of bottlenecks your real code might have, like ALU latency vs. throughput. (A slow interpreter will make ALU latency irrelevant, vs. a huge bottleneck for the sum of an array of floats.) In general, speed ratios between two things under valgrind might or might not be similar to the ratio when running on real hardware. There are actual profilers like Linux `perf record` / `perf report` which use hardware events to sample and find hot spots – Peter Cordes Jul 14 '22 at 11:13

1 Answers1

1

You can use some container to store the times, pass it by reference and accumulate the times. For example with a std::map<std::string,unsinged> to have labels:

int recursive(int n, std::map<std::string,unsigned>& times) { 
    if (n >= 0) return;
    // measure time of foo
    times["foo"] += duration_foo;
    // measure time of bar
    times["bar"] += duration_bar;
    // recurse
    recursive(n-1,times);
}

Then

std::map<std::string,unsigned> times;
recursive(200,times);
for (const auto& t : times) {
    std::cout << t.first << " took total : " << t.second << "\n";
}
463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
  • 1
    Timing overhead (and cache pollutions) would be lower if you used an `enum` to index an array of durations, rather than a `std::map` with `std::string` keys. Getting a duration takes a significant amount of code, but even with compile-time eval of hashing a `string` constructed from a string-literal, you still have code to search a red-black tree (usually). (`std::unordered_map` is typically a hash table.) Anyway, minimizing timing overhead is probably a good idea if you're going to sprinkle it around inside your function. – Peter Cordes Jul 15 '22 at 09:55
  • @PeterCordes usage of the map is outside of the blocks that get measured, but I do get your point – 463035818_is_not_an_ai Jul 15 '22 at 09:57
  • 1
    Polluting code/data caches with more code and touching more data will potentially have more of an effect on the timed regions. And could impact optimization of the timed regions, although any std::chrono function is going to involve a non-inline function call on a normal C++ implementation. Lower overhead and impact on optimization could be to count time in x86 reference cycles with `__rdtsc()` (which can inline to a single instruction). That would be small enough for out-of-order exec to see around it (only a dozen or so uops on modern x86), although maybe for time-accounting that's bad. – Peter Cordes Jul 15 '22 at 10:04