Applying memoization makes golom sequence slower

Question

I am trying to wrap my head around memoization using c++, and I am trying to do an example with the "golom sequence"

int main(int argc, char* argv[])
{
    std::unordered_map<int, int> hashTable;

    
    int value = 7;
    
    auto start = std::chrono::high_resolution_clock::now(); 
    std::cout << golomS(4, hashTable) << std::endl;
    auto stop = std::chrono::high_resolution_clock::now();
    
    auto start1 = std::chrono::high_resolution_clock::now();
    std::cout << golom(4) << std::endl;;
    auto stop1 = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(stop1 - start1);

    std::cout << "Time taken by function 1: "
           << duration.count() << " microseconds" << std::endl;

    
    std::cout << "Time taken by function 2: "
          << duration1.count() << " microseconds" << std::endl;
    
    return 0;
}

int golom(int n)
{
    if (n == 1) {return 1;}
    return 1 + golom(n - golom(golom(n - 1)));
}

int golomS(int n, std::unordered_map<int, int> golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}

As an output I get this:

4

Time taken by function 1: 687 microseconds //This is Golom S

Time taken by function 2: 242 microseconds //This is Golom

Shouldn't GolomS be faster with memoization? I tried wrapping my head around the debugger, but im not sure where the effective "slowness" is coming from.

My question is: How can I change the method I made golomS, to be faster than golom. Thank you. -Adam

In your memoized version, you're passing `golomb` [by value, not by reference](https://stackoverflow.com/questions/410593/pass-by-reference-value-in-c). That means every time you make a call to it, it creates a _copy_ of its second argument, and then changes to that copy are not seen by any other call. — Nathan Pierson, Nov 01 '21 at 14:25
Benchmarking is HARD. In particular, calling the functions with a compile-time known constant: `4` opens the door to a lot of compiler sheananigans. Considering `golom()` has no side effects, whereas `golomS()` does, that could be enough to account for the difference. — , Nov 01 '21 at 14:27
That being said even when I implement this change and compile with `-O3`, the non-memoized version still ends up being faster. It looks likely that the cost of doing `unordered_map` lookups exceeds the benefits of memoizing until you start trying to compute further into the sequence. For instance, trying to compute the 40th element instead of the 4th shows how much more better scaling the memoized version has. — Nathan Pierson, Nov 01 '21 at 14:34
Try golom(20). In the debugger, how often should golomS be called? Exactly 20 times, because of the recursive call to golomS(n-1, ...). How often is it called? How often does it find a value in the unordered_map? Never. — gnasher729, Nov 01 '21 at 15:09

score 6 · Accepted Answer · 2021-11-01T20:03:49.113

On top of the other answers, I would like to add that this could really benefit from proper benchmarking.

In order to get reliable results you want to run the tests multiple times, and take steps to ensure that memory caches and other system-level shenanigans aren't interfering with the results.

Thankfully, there are libraries out there that can handle most of these complexities for us. For example, here's a better benchmark of your code using google's Benchmark library:

N.B. the fixes proposed by the other answers have been integrated.

#include <chrono>

int golom(int n)
{
    if (n == 1) {return 1;}
    return 1 + golom(n - golom(golom(n - 1)));
}

int golomS(int n, std::unordered_map<int, int>& golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}


static void TestGolomb(benchmark::State& state) {
  for (auto _ : state) {
    benchmark::DoNotOptimize(golom(state.range(0)));
  }
}
BENCHMARK(TestGolomb)->DenseRange(1, 17, 2);

static void TestGolombS(benchmark::State& state) {
  for (auto _ : state) {
    // Make sure we always start from a fresh map
    std::unordered_map<int, int> golomb;

    // Ignore construction and destruction of the map
    auto start = std::chrono::high_resolution_clock::now();
    benchmark::DoNotOptimize(golomS(state.range(0), golomb));
    auto end = std::chrono::high_resolution_clock::now();

    auto elapsed_seconds =
      std::chrono::duration_cast<std::chrono::duration<double>>(
        end - start);

    state.SetIterationTime(elapsed_seconds.count());
  }
}
BENCHMARK(TestGolombS)->DenseRange(1, 17, 2);

Which gives us the following profile suggesting that the breakeven point is around 14-15:

see on quick-bench

Thank you for showing this to me, will definitely try this out as well! This is sweet. — Adam Kostandy, Nov 01 '21 at 14:53
@AdamKostandy Don't read too much into quick-bench results though. You should be doing this in a controlled environment if you want actual accuracy. — , Nov 01 '21 at 14:54
@Frank I would really be interested of a benchmark of my post (see below). Time complexity is O(1). Maybe you find the time to add it . . . — A M, Nov 02 '21 at 19:17
@ArminMontigny You can edit and run the benchmark for yourself via the link at the bottom of my answer. It's really straightforward. — , Nov 02 '21 at 19:25
I did it and copy and pasted the picture below. See for yourself . . . — A M, Nov 02 '21 at 19:41
Just a tiny side note about using Google Benchmark - you can pause and resume timer inside measurement loop through `state.PauseTiming();` and `state.ResumeTiming();`, as [described here](https://github.com/google/benchmark/blob/main/docs/user_guide.md#controlling-timers). This way you can exclude construction/destruction of `unordered_map` without using manual timers from `chrono`. For example [like here](https://quick-bench.com/q/ewISYl0Zp3nKSf7gXYgg5dA7tEU). — Arty, Nov 05 '21 at 13:59

score 1 · Answer 2 · answered Nov 01 '21 at 14:29

1

int golomS(int n, std::unordered_map<int, int> golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}

In your goloms function, you pass golomb as a value, not a reference. That means that every time you call goloms, the compiler will make a copy of golomb and destroy that copy when it's out of scope, not changing the actual golomb value.

You should pass by reference instead.

int golomS(int n, std::unordered_map<int, int>& golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}

answered Nov 01 '21 at 14:29

justANewb stands with Ukraine

4,535
3
14
39

Have you tested that making this change does, in fact, result in `golomS()` being faster than `golom()` in the OP's test bench? – Nov 01 '21 at 14:32
Thank you!! I was assuming that unordered maps were like arrays and they were always passed by reference. This makes alot more sense now thank you! – Adam Kostandy Nov 01 '21 at 14:35
@Frank apparently in my machine, both functions run in 0 microseconds. Will test that with larger data though. – justANewb stands with Ukraine Nov 01 '21 at 14:36
After doing a test, for larger values it does indeed to be faster at higher values. – Adam Kostandy Nov 01 '21 at 14:36
3

C-style arrays are weird and very few other things in the language have the same behavior as them. Strictly speaking they aren't actually passed by reference either--they experience [array-to-pointer decay](https://stackoverflow.com/questions/1461432/what-is-array-to-pointer-decay) and the pointer is passed by value. – Nathan Pierson Nov 01 '21 at 14:36
12 12 Time taken by function 1: 184 microseconds Time taken by function 2: 10124 microseconds – Adam Kostandy Nov 01 '21 at 14:37

Arty · Answer 3 · 2021-11-01T17:27:19.423

There are two issues in your code. First main issue is that you pass unordered map by value. You have to pass it by reference to make it faster, to avoid copying whole map on every function call. Also if you pass map by value then on return newly computed values are not preserved, but if you pass it by reference then new values will give benefit to caller functions.

Second important issue is that you use very small value 4 as input of a function to measure time, it takes very little time for 4 to compute, you have to use bigger value like 50, so that it takes more time to compute and hence gives more accurate time measurement.

Fixed version of your code has already very improved timings of first function compared to second (for input 50):

13
13
Time taken by function 1: 110 microseconds
Time taken by function 2: 118345 microseconds

Fixed code:

Try it online!

#include <unordered_map>
#include <iostream>
#include <chrono>

int golom(int n)
{
    if (n == 1) {return 1;}
    return 1 + golom(n - golom(golom(n - 1)));
}

int golomS(int n, std::unordered_map<int, int> & golomb)
{
    if(n == 1)
        return 1; 
    if(golomb.find(n) == golomb.end())
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    return golomb[n];
}

int main(int argc, char* argv[])
{
    std::unordered_map<int, int> hashTable;
    
    auto start = std::chrono::high_resolution_clock::now(); 
    std::cout << golomS(50, hashTable) << std::endl;
    auto stop = std::chrono::high_resolution_clock::now();
    
    auto start1 = std::chrono::high_resolution_clock::now();
    std::cout << golom(50) << std::endl;;
    auto stop1 = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(stop1 - start1);

    std::cout << "Time taken by function 1: "
           << duration.count() << " microseconds" << std::endl;
    
    std::cout << "Time taken by function 2: "
          << duration1.count() << " microseconds" << std::endl;
    
    return 0;
}

A M · Answer 4 · 2021-11-02T19:39:52.627

Very good. Please try and continue to learn about memoization.

I will give an additional method. You can also "memoize" values in a DP table. DP like dynamic programming.

If you go for speed and can afford a big memory consumption, then this is the fastest and most efficient solution. Using recursion for that sequence is basically a bad idea and will result in very slow code and potential stack overflows.

Converting the recursive into an iterative solution with memoization in a DP table, will give you a very fast result. Even for big numbers. With compiler optimizations on, you can calculate the complete sequence for example up to 100000 ultrafast.

See the below:

#include <iostream>
#include <array>
#include <cstdint>

// We will calculate a Golomb sequence up to the below specified element
constexpr size_t MaxValues = 100000;

using MyDataType = uint_fast16_t;
using GolombSequence = std::array<MyDataType, MaxValues>;

// This is the Golomb Sequence
GolombSequence golombSequence{};

// Iterative Function to calculate a Golomb sequence using a DP table
void createGolombSequence() {

    golombSequence[1] = 1;
    for (int i = 2; i < MaxValues; ++i) {
        golombSequence[i] = 1 + golombSequence[i - golombSequence[golombSequence[i - 1]]];
    }
}

// Test
int main() {
   createGolombSequence();

   std::cout << golombSequence[777] << '\n';
   std::cout << golombSequence[7777] << '\n';
   std::cout << golombSequence[77777] << '\n';
}

We can even go one step ahead and use "compile time memoization", meaning, we will calculate the complete sequence at compile time. So, let the compiler do the work.

This will of course result in the fastest possible solution with the downside of high memory consumption and a little bit limited by the cababilities of the compiler.

Please see below. All Golomb value calculation is done at compile time. At runtime, we just use an ultrafast indexing operation:

#include <iostream>
#include <array>
#include <cstdint>

// We will calculate a Golomb sequence up to the below specified element
constexpr size_t MaxValues = 100000u;
using MyDataType = uint_fast16_t;
using GolombSequence = std::array<MyDataType, MaxValues>;

// Iterative Function to calculate a Golomb sequence using a DP table
constexpr GolombSequence createGolombSequence() {

    GolombSequence golombSequence{};
    golombSequence[1] = 1;
    for (int i = 2; i < MaxValues; ++i) {
        golombSequence[i] = 1 + golombSequence[i - golombSequence[golombSequence[i - 1]]];
    }
    return golombSequence;
}
// This is the Golomb Sequence
constexpr GolombSequence golombSequence = createGolombSequence();

// Test
int main() {

   std::cout << golombSequence[777] << '\n';
   std::cout << golombSequence[7777] << '\n';
   std::cout << golombSequence[77777] << '\n';
}

I added my code to Franks Benchmark. The result is dramatically outperforming everything else. The green line at 0ms is for the above algorithm.

Used benchmark code

#include <chrono>
#include <iostream>
#include <array>
#include <cstdint>

// We will calculate a Golomb sequence up to the below specified element
constexpr size_t MaxValues = 10000u;
using MyDataType = uint_fast16_t;
using GolombSequence = std::array<MyDataType, MaxValues>;

// Iterative Function to calculate a Golomb sequence using a DP table
constexpr GolombSequence createGolombSequence() {

    GolombSequence golombSequence{};
    golombSequence[1] = 1;
    for (int i = 2; i < MaxValues; ++i) {
        golombSequence[i] = 1 + golombSequence[i - golombSequence[golombSequence[i - 1]]];
    }
    return golombSequence;
}
// This is the Golomb Sequence
constexpr GolombSequence golombSequence = createGolombSequence();

inline int golombFast(size_t i) { return golombSequence[i]; };


int golom(int n)
{
    if (n == 1) {return 1;}
    return 1 + golom(n - golom(golom(n - 1)));
}

int golomS(int n, std::unordered_map<int, int>& golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}

static void TestGolombFast(benchmark::State& state) {
  for (auto _ : state) {
    benchmark::DoNotOptimize(golombFast(state.range(0)));
  }
}
BENCHMARK(TestGolombFast)->DenseRange(1, 17, 2);



static void TestGolomb(benchmark::State& state) {
  for (auto _ : state) {
    benchmark::DoNotOptimize(golom(state.range(0)));
  }
}
BENCHMARK(TestGolomb)->DenseRange(1, 17, 2);

static void TestGolombS(benchmark::State& state) {
for (auto _ : state) {
    std::unordered_map<int, int> golomb;

    // Be extra-generous, ignore construction and destruction of the set
    auto start = std::chrono::high_resolution_clock::now();
    benchmark::DoNotOptimize(golomS(state.range(0), golomb));
    auto end = std::chrono::high_resolution_clock::now();

    auto elapsed_seconds =
      std::chrono::duration_cast<std::chrono::duration<double>>(
        end - start);

    state.SetIterationTime(elapsed_seconds.count());
  }
}
BENCHMARK(TestGolombS)->DenseRange(1, 17, 2);

The OP's code is already memoized. The question is about why the memoized version was not outperforming the base algorithm in their test. — , Nov 02 '21 at 19:27
The answer is that it was wrongly implemented and the wrong algorithm has been chosen . . . — A M, Nov 02 '21 at 19:43

Applying memoization makes golom sequence slower

4 Answers4