High performance recursive functions in C++

Question

I am learning C++ and I was toying with linked lists.

Consider the following extremely simple implementation:


static std::size_t n_constructors {0};


template <typename T>
class list {
public:
    list(T n) : item{n} { n_constructors++; }
    list(T n, list* l) : item{n}, next {l} { n_constructors++; }
    list(const list&);
    ~list() { delete next; }
    void insert(T n);
    void print() const;
    list reverse() const;
    list reverse_iter() const;
private:
    T item;
    list* next {nullptr};
};


template <typename T>
list<T>::list(const list& src)
    :
    item {src.item}
{
    n_constructors++;
    if (src.next) {
        next = new list {*src.next};
    }
}


template <typename T>
void list<T>::insert(T n) {
    if (next == nullptr)
        next = new list {n};
    else
        next->insert(n);
}


template <typename T>
void list<T>::print() const {
    std::cout << item << " ";
    if (next)
        next->print();
    else
        std::cout << std::endl;
}


template <typename T>
list<T> list<T>::reverse() const {
    if (next) {
        auto s = next->reverse();
        s.insert(item);
        return s;
    } else {
        return list {item};
    }
}


template <typename T>
list<T> list<T>::reverse_iter() const {
    auto sptr = new list<T> {item};
    auto prev = next;
    auto link = next;
    while (link) {
        auto tmp = new list<T>(link->item, sptr);
        link = link->next;
        prev = link;
        sptr = tmp;
    }
    return *sptr;
}

As you can see, I wrote two reverse functions: an iterative one and a recursive one. To test them out, I tried this:

int main() {
    list<int> s{1};
    for (int i = 2; i < 10000; ++i)
        s.insert(i);
    std::cout << "initial constructor\n";
    std::cout << "called " << n_constructors << " constructors.\n";
    auto t0 = std::chrono::high_resolution_clock::now();
    auto s2 = s.reverse_iter();
    auto t = std::chrono::high_resolution_clock::now();
    std::cout << "iterative reverse\n";
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t - t0).count() <<" ms\n";
    std::cout << "called " << n_constructors << " constructors.\n";
    t0 = std::chrono::high_resolution_clock::now();
    auto s3 = s.reverse();
    t = std::chrono::high_resolution_clock::now();
    std::cout << "recursive reverse\n";
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t - t0).count() <<" ms\n";
    std::cout << "called " << n_constructors << " constructors.\n";
}

here's a typical result of my benchmark (compiled with g++ 9.3.0, optimiser turned on with -O2):

initial constructor
called 9999 constructors.
iterative reverse
0 ms
called 29997 constructors.
recursive reverse
1692 ms
called 50034995 constructors.

The difference in performance is staggering. I guess that the problem with the recursive version is the much larger number of allocations, so I implement a move constructor:

template <typename T>
list<T>::list(list&& src)
    :
    item {src.item},
    next {src.next}
{
    n_constructors++;
    src.next = nullptr;
    src.item = 0;
}

This is the result:

initial constructor
called 9999 constructors.
iterative reverse
0 ms
called 29997 constructors.
recursive reverse
90 ms
called 49994 constructors.

Very good, now at least the recursive function does just as many allocations as the iterative function. The performance difference is however still large. I try again with 100 000 elements:

initial constructor
called 99999 constructors.
iterative reverse
7 ms
called 299997 constructors.
recursive reverse
9458 ms
called 499994 constructors.

In my opinion the recursive reverse is much more readable and elegant than the iterative one.

Why is the recursive version so much slower? Is there something I can do to make it faster?

Edit

I add the plots with the empirical asymptotic order of growth for the two algorithms, as recommended by Will Ness in the comments.

It seems you're asking a [similar question](https://stackoverflow.com/questions/35215420/are-loops-really-faster-than-recursion). — WhozCraig, Oct 24 '20 at 15:37
`reverse_iter` is linear, `reverse` is quadratic (since it calls `insert` which is itself linear). You spend time traversing links, not calling constructors. — Igor Tandetnik, Oct 24 '20 at 15:38
@JaMiT if there is no way to make recursion faster than looping, I would be happy just to know why recursion is so much slower (3 orders of magnitude!) despite doing as many allocations as the iterative version. — J. D., Oct 24 '20 at 15:39
It's not about recursion per se. It's about having to traverse the half-built list every time to insert the node at the end. The way `reverse` works is, it traverses the list to the end, and makes a copy of the last node. Then it inserts the copy of the next-to-last node into this one-element list. Then it inserts a copy of the previous note into this two-element list, traversing it to the end. And so on. For every node, it ends up traversing the partially-built reverse list all the way from start to end. In contrast, `reverse_iter` does all the work in a single pass. — Igor Tandetnik, Oct 24 '20 at 15:43
@IgorTandetnik makes sense, thanks. I think I could make the recursive algorithm linear again just by keeping a pointer to the last element of the list, does it sound right? — J. D., Oct 24 '20 at 15:47
You could, yes. Though I suspect at that point it won't be much simpler than `reverse_iter` (By the way, `prev` is not used for anything in `reverse_iter`; you can drop it and shave two lines of code). Also, make sure to compile with all optimizations enabled for benchmarking; most recursion in your code is tail recursion, the compiler should be able to covert it to iteration automatically. — Igor Tandetnik, Oct 24 '20 at 15:51

Ami Tavory · Accepted Answer · 2020-10-24T17:25:38.373

2

You're not only testing here between recursion and iteration, but between two algorithms that have a different order of growth.

Your reverse has the complexity T(n) = T(n - 1) + O(n), and so is quadratic in the number of links. Your reverse_iter has only linear complexity. The comparison is unfair, therefore.

edited Oct 24 '20 at 17:25

answered Oct 24 '20 at 15:42

Ami Tavory

74,578
11
141
185

1

Did you mean T(n) = T(n - 1) + O(n)? – J. D. Oct 24 '20 at 15:50
@J.D. that would make it quadratic, n*n. also, performance comparisons at *one* size point are largely meaningless. see [this](http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth). the meaningful one is the log-log plot of time vs problem size, with at least two size points. – Will Ness Oct 24 '20 at 15:56
@WillNess point taken, thanks. I think the answer by Ami Tavory wanted indeed to point out that `reverse` is quadratic, so it cannot be T(n) = T(n - 1) + O(1) which is linear in n. – J. D. Oct 24 '20 at 16:01
1

@J.D. ah, right. I thought it referred to the other one. :) btw, the meaningful comparison is the log-log plot of time vs problem size, with at least two size points. see [this](https://stackoverflow.com/questions/29992904/enumerate-factors-of-a-number-directly-in-ascending-order-without-sorting) for an excellent example. that's for a power law growth; for exponential it's the log-plot of course. – Will Ness Oct 24 '20 at 16:05
@WillNess It's a very meaningful plot and I will try to make a similar plot for my code (tomorrow I hope). The log log plot should make it possible to compare the two algorithms despite the vast difference in performance. I see the cry for empirical orders of growth in your profile page, are you worried that asymptotic order of growth might be too asymptotic? – J. D. Oct 24 '20 at 16:37
@J.D. all I'm saying is, use it. I don't draw whole plots, myself, just do some 2-3-4 measurements at 2x the size (or 10x, whatever), usually it's enough. – Will Ness Oct 24 '20 at 16:43
@WillNess I finally found the time to add the plots in my original post. It was also quite fun to generate the data---I hope my CPU was not throttled down too much as I did this benchmark on a laptop and I could distinctively feel it heat up. – J. D. Oct 27 '20 at 18:07

High performance recursive functions in C++

Edit

1 Answers1