The algorithm compute a custom Tribonacci sequence modulus 10000000007. This can be computed very efficiently in O(log n)
time and O(1)
space based on a matrix representation of the sequence like for Fibonacci. See this post and this answer for more information.
This method is very fast for big inputs (much faster than using memoization) and very memory efficient.
However, there is a catch: the implementation needs to computation the multiplication of two 64-bit integers modulo a 64-bit integer (unless the modulus can fit in a 32-bit integer). 128-bit integers are not supported on most platforms, but boost::multiprecision
can be used. See this post for more information.
Here is an implementation:
int64_t int128_mulMod(int64_t a, int64_t b, int64_t mod)
{
return int64_t((int128_t(a) * int128_t(b)) % int128_t(mod));
}
// Compute the matrix multiplication of a * b and put the result in a
void matMulMod(int64_t a[3][3], int64_t b[3][3], int64_t mod)
{
int64_t res[3][3] = {};
for(int i=0 ; i<3 ; ++i)
for(int j=0 ; j<3 ; ++j)
for(int k=0 ; k<3 ; ++k)
res[i][j] += int128_mulMod(a[i][k], b[k][j], mod);
for(int i=0 ; i<3 ; ++i)
for(int j=0 ; j<3 ; ++j)
a[i][j] = res[i][j];
}
// Compute the matrix exponentiation a^n and put the result in a
void matPowMod(int64_t a[3][3], int64_t n, int64_t mod)
{
if(n == 0 || n == 1)
return;
int64_t m[3][3] = {{ 1, 1, 1 },
{ 1, 0, 0 },
{ 0, 1, 0 }};
// Use recursion to square the matrix efficiently
matPowMod(a, n / 2, mod);
// Square the matrix
matMulMod(a, a, mod);
// Remainder case
if(n % 2)
matMulMod(a, m, mod);
// Apply the modulus on each term to prevent overflows
for(int i=0 ; i<3 ; ++i)
for(int j=0 ; j<3 ; ++j)
a[i][j] %= mod;
}
int64_t ways_fast(int64_t n)
{
int64_t a[3][3] = {{ 1, 1, 1 },
{ 1, 0, 0 },
{ 0, 1, 0 }};
if(n==1) return 1;
if(n==2) return 2;
if(n==3) return 4;
matPowMod(a, n, 10000000007ll);
return a[0][0]; // Result of the Tribonacci sequence
}
For example, on my machine, ways_fast(30'000)
is 100 times faster than computing the memoization-based method provided by @Bob__. Moreover, the memoization algorithm crashes when computing ways(60'000)
while ways_fast(200'000)
works correctly.
For more information, please read: