Using matrices to find the number of different ways to write n as the sum of 1, 3, and 4?

Question

This is a question given in this presentation. Dynamic Programming

now i have implemented the algorithm using recursion and it works fine for small values. But when n is greater than 30 it becomes really slow.The presentation mentions that for large values of n one should consider something similar to the matrix form of Fibonacci numbers .I am having trouble undestanding how to use the matrix form of Fibonacci numbers to come up with a solution.Can some one give me some hints or pseudocode

Thanks

That makes sense, when you're doing dynamic programming you'll generally want to use [*memoization*](https://en.wikipedia.org/wiki/Memoization) to avoid re-calculating the same values over and over again — Nir Alfasi, Dec 12 '16 at 23:16
Do you consider sums different if they have different numbers in them, or different orders of numbers in them? For example, is 1 + 3 the same as 3 + 1? — templatetypedef, Dec 12 '16 at 23:50
@alfasin: This may just be a terminological difference, but I'd define "dynamic programming" as being when you populate the array in a deterministic/methodical way starting from "the bottom" (hence the alternative name "bottom-up recursion"), and "memoization" as when you just use standard top-down recursive strategy, but tack on result-caching so that already-calculated intermediate results can be reused. So they're related, but mutually exclusive approaches; neither uses the other. — ruakh, Dec 13 '16 at 01:01
@ruakh thank god for SO... :))) http://stackoverflow.com/questions/6184869/what-is-difference-between-memoization-and-dynamic-programming — Nir Alfasi, Dec 13 '16 at 01:03
also see my blogpost on this: http://www.robert-king.com/posts/20111103_grid_tours/ — Rusty Rob, Dec 13 '16 at 01:41

templatetypedef · Answer 1 · 2016-12-13T01:18:16.757

Yes, you can use the technique from fast Fibonacci implementations to solve this problem in time O(log n)! Here's how to do it.

Let's go with your definition from the problem statement that 1 + 3 is counted the same as 3 + 1. Then you have the following recurrence relation:

A(0) = 1
A(1) = 1
A(2) = 1
A(3) = 2
A(k+4) = A(k) + A(k+1) + A(k+3)

The matrix trick here is to notice that

 | 1  0  1  1 | |A( k )|   |A(k) + A(k-2) + A(k-3)|   |A(k+1)|
 | 1  0  0  0 | |A(k-1)|   |         A( k )       |   |A( k )|
 | 0  1  0  0 | |A(k-2)| = |         A(k-1)       | = |A(k-1)|
 | 0  0  1  0 | |A(k-3)|   |         A(k-2)       | = |A(k-2)|

In other words, multiplying a vector of the last four values in the series produces a vector with those values shifted forward by one step.

Let's call that matrix there M. Then notice that

     |A( k )|   |A(k+2)|
     |A(k-1)|   |A(k+1)|
 M^2 |A(k-2)| = |A( k )|
     |A(k-3)|   |A(k-1)|

In other words, multiplying by the square of this matrix shifts the series down two steps. More generally:

     |A( k )|   |  A(k+n)  |
     |A(k-1)|   |A(k-1 + n)|
 M^n |A(k-2)| = |A(k-2 + n)|
     |A(k-3)|   |A(k-3 + n)|

So multiplying by Mⁿ shifts the series down n steps. Now, if we want to know the value of A(n+3), we can just compute

     |A(3)|   |A(n+3)|
     |A(2)|   |A(n+2)|
 M^n |A(1)| = |A(n+1)|
     |A(0)|   |A(n+2)|

and read off the top entry of the vector! This can be done in time O(log n) by using exponentiation by squaring. Here's some code that does just that. This uses a matrix library I cobbled together a while back:

#include "Matrix.hh"
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <algorithm>
using namespace std;

/* Naive implementations of A. */
uint64_t naiveA(int n) {
    if (n == 0) return 1;
    if (n == 1) return 1;
    if (n == 2) return 1;
    if (n == 3) return 2;
    return naiveA(n-1) + naiveA(n-3) + naiveA(n-4);
}

/* Constructs and returns the giant matrix. */
Matrix<4, 4, uint64_t> M() {
  Matrix<4, 4, uint64_t> result;
  fill(result.begin(), result.end(), uint64_t(0));

  result[0][0] = 1;
  result[0][2] = 1;
  result[0][3] = 1;
  result[1][0] = 1;
  result[2][1] = 1;
  result[3][2] = 1;

  return result;
}

/* Constructs the initial vector that we multiply the matrix by. */
Vector<4, uint64_t> initVec() {
  Vector<4, uint64_t> result;
  result[0] = 2;
  result[1] = 1;
  result[2] = 1;
  result[3] = 1;
  return result;
}

/* O(log n) time for raising a matrix to a power. */
Matrix<4, 4, uint64_t> fastPower(const Matrix<4, 4, uint64_t>& m, int n) {
  if (n == 0) return Identity<4, uint64_t>();
  auto half = fastPower(m, n / 2);

  if (n % 2 == 0) return half * half;
  else return half * half * m;
}

/* Fast implementation of A(n) using matrix exponentiation. */
uint64_t fastA(int n) {
  if (n == 0) return 1;
  if (n == 1) return 1;
  if (n == 2) return 1;
  if (n == 3) return 2;

  auto result = fastPower(M(), n - 3) * initVec();
  return result[0];
}

/* Some simple test code showing this in action! */
int main() {
  for (int i = 0; i < 25; i++) {
    cout << setw(2) << i << ": " << naiveA(i) << ", " << fastA(i) << endl;
  }
}

Now, how would this change if 3 + 1 and 1 + 3 were treated as equivalent? This means that we can think about solving this problem in the following way:

Let A(n) be the number of ways to write n as a sum of 1s, 3s, and 4s.
Let B(n) be the number of ways to write n as a sum of 1s and 3s.
Let C(n) be the number of ways to write n as a sum of 1s.

We then have the following:

A(n) = B(n) for all n ≤ 3, since for numbers in that range the only options are to use 1s and 3s.
A(n + 4) = A(n) + B(n + 4), since your options are either (1) use a 4 or (2) not use a 4, leaving the remaining sum to use 1s and 3s.
B(n) = C(n) for all n ≤ 2, since for numbers in that range the only options are to use 1s.
B(n + 3) = B(n) + C(n + 3), sine your options are either (1) use a 3 or (2) not use a 3, leaving the remaining sum to use only 1s.
C(0) = 1, since there's only one way to write 0 as a sum of no numbers.
C(n+1) = C(n), since the only way to write something with 1s is to pull out a 1 and write the remaining number as a sum of 1s.

That's a lot to take in, but do notice the following: we ultimately care about A(n), and to evaluate it, we only need to know the values of A(n), A(n-1), A(n-2), A(n-3), B(n), B(n-1), B(n-2), B(n-3), C(n), C(n-1), C(n-2), and C(n-3).

Let's imagine, for example, that we know these twelve values for some fixed value of n. We can learn those twelve values for the next value of n as follows:

C(n+1) = C(n)
B(n+1) = B(n-2) + C(n+1) = B(n-2) + C(n)
A(n+1) = A(n-3) + B(n+1) = A(n-3) + B(n-2) + C(n)

And the remaining values then shift down.

We can formulate this as a giant matrix equation:

  A( n ) A(n-1) A(n-2) A(n-3) B( n ) B(n-1) B(n-2) C( n )
|    0      0      0      1      0      0      1      1   | |A( n )| = |A(n+1)|
|    1      0      0      0      0      0      0      0   | |A(n-1)| = |A( n )|
|    0      1      0      0      0      0      0      0   | |A(n-2)| = |A(n-1)|
|    0      0      1      0      0      0      0      0   | |A(n-3)| = |A(n-2)|
|    0      0      0      0      0      0      1      1   | |B( n )| = |B(n+1)|
|    0      0      0      0      1      0      0      0   | |B(n-1)| = |B( n )|
|    0      0      0      0      0      1      0      0   | |B(n-2)| = |B(n-1)|
|    0      0      0      0      0      0      0      1   | |C( n )| = |C(n+1)|

Let's call this gigantic matrix here M. Then if we compute

     |2|  // A(3) = 2, since 3 = 3 or 3 = 1 + 1 + 1
     |1|  // A(2) = 1, since 2 = 1 + 1
     |1|  // A(1) = 1, since 1 = 1
 M^n |1|  // A(0) = 1, since 0 = (empty sum)
     |2|  // B(3) = 2, since 3 = 3 or 3 = 1 + 1 + 1
     |1|  // B(2) = 1, since 2 = 1 + 1
     |1|  // B(1) = 1, since 1 = 1
     |1|  // C(3) = 1, since 3 = 1 + 1 + 1

We'll get back a vector whose first entry is A(n+3), the number of ways to write n+3 as a sum of 1's, 3's, and 4's. (I've actually coded this up to check it - it works!) You can then use the technique for computing Fibonacci numbers using a matrix to a power efficiently that you saw with Fibonacci numbers to solve this in time O(log n).

Here's some code doing that:

#include "Matrix.hh"
#include <cstdint>
#include <iomanip>
#include <iostream>
#include <algorithm>
using namespace std;

/* Naive implementations of A, B, and C. */
uint64_t naiveC(int n) {
  return 1;
}

uint64_t naiveB(int n) {
  return (n < 3? 0 : naiveB(n-3)) + naiveC(n);
}

uint64_t naiveA(int n) {
  return (n < 4? 0 : naiveA(n-4)) + naiveB(n);
}

/* Constructs and returns the giant matrix. */
Matrix<8, 8, uint64_t> M() {
  Matrix<8, 8, uint64_t> result;
  fill(result.begin(), result.end(), uint64_t(0));

  result[0][3] = 1;
  result[0][6] = 1;
  result[0][7] = 1;
  result[1][0] = 1;
  result[2][1] = 1;
  result[3][2] = 1;
  result[4][6] = 1;
  result[4][7] = 1;
  result[5][4] = 1;
  result[6][5] = 1;
  result[7][7] = 1;

  return result;
}

/* Constructs the initial vector that we multiply the matrix by. */
Vector<8, uint64_t> initVec() {
  Vector<8, uint64_t> result;
  result[0] = 2;
  result[1] = 1;
  result[2] = 1;
  result[3] = 1;
  result[4] = 2;
  result[5] = 1;
  result[6] = 1;
  result[7] = 1;
  return result;
}

/* O(log n) time for raising a matrix to a power. */
Matrix<8, 8, uint64_t> fastPower(const Matrix<8, 8, uint64_t>& m, int n) {
  if (n == 0) return Identity<8, uint64_t>();
  auto half = fastPower(m, n / 2);

  if (n % 2 == 0) return half * half;
  else return half * half * m;
}

/* Fast implementation of A(n) using matrix exponentiation. */
uint64_t fastA(int n) {
  if (n == 0) return 1;
  if (n == 1) return 1;
  if (n == 2) return 1;
  if (n == 3) return 2;

  auto result = fastPower(M(), n - 3) * initVec();
  return result[0];
}

/* Some simple test code showing this in action! */
int main() {
  for (int i = 0; i < 25; i++) {
    cout << setw(2) << i << ": " << naiveA(i) << ", " << fastA(i) << endl;
  }
}

thanks for the detailed explanation. but 1+3 and 3+1 should be treated as two unique entities — user119020, Dec 13 '16 at 01:07
"you can use the technique from fast Fibonacci implementations to solve this problem in time O(log n)" - well, you can in a model of computation where arithmetic is constant time. With how huge the results get, the constant-time arithmetic assumption quickly breaks down in practice. — user2357112, Dec 13 '16 at 01:11
@user2357112 That's absolutely true. I'm assuming that we'd be working with fixed-width numbers and working modulo some large prime, since this looks like a programming-contest problem. :-) — templatetypedef, Dec 13 '16 at 01:12
Opposite to what you say, I think the first part of your answer gives the result when 1+3 and 3+1 are treated as different, and the second part when they're treated as the same. — Paul Hankin, Dec 13 '16 at 01:17
@ templatetypedef. Thanks!! i am going through your solution. you have said Let A(n) be the number of ways to write n as a sum of 1s, 3s, and 4s. Let B(n) be the number of ways to write n as a sum of 1s and 3s. Let C(n) be the number of ways to write n as a sum of 1s..Shouldn't there be D(n) = n as a sum of 3s E(n) = n as a sum of 4s F(n)= n as a sum of 1s and 4s as well — user119020, Dec 13 '16 at 01:17
@user119020 It's not actually necessary to do that. When order is irrelevant, the idea is to progressively start eliminating numbers from consideration. Going from A to B removes 4, going from B to C removes 3, and if you removed 1 there wouldn't be anything left. — templatetypedef, Dec 13 '16 at 01:19
@templatetypedef does A(k) stand for fibonacci value of k.And how do you construct the giant matrixfor other values?. lets say n as a sum of 1,2,3,4,5 — user119020, Dec 13 '16 at 01:53
A(k) is the number of ways to write k as a sum using 1, 3, and 4. The way to generalize this is as follows. First, write a recurrence relation and look at how many terms you need to remember to evaluate it (call it n). Then write a matrix that when multiplied by a vector of the last n values, creates a new vector that "shifts" in the next value and "shifts" out the oldest one. Finally, raise the matrix to a sufficient large power using exponention by squaring. — templatetypedef, Dec 13 '16 at 02:01
@templatetypedef. okay i think im getting the hang of it. but in you example, after matrix multiplication you have written A(k+1) = A(K)+ A(k-2) + A(k-3) shouldn't that be A(k+1) = A(K)+ A(k-1) + A(k-2) or am i missing something — user119020, Dec 13 '16 at 02:38
I think it's right as written. The first number in the sum is either 1, 3, or 4, so if you want to know A(k+1), you'd need A((k+1) - 1), A((k+1) - 3), or A((k+1) - 4), which are A(k), A(k-2), and A(k-3). — templatetypedef, Dec 13 '16 at 02:54

score 1 · Answer 2 · edited May 23 '17 at 12:09

This is a very interesting sequence. It is almost but not quite the order-4 Fibonacci (a.k.a. Tetranacci) numbers. Having extracted the doubling formulas for Tetranacci from its companion matrix, I could not resist doing it again for this very similar recurrence relation.

Before we get into the actual code, some definitions and a short derivation of the formulas used are in order. Define an integer sequence A such that:

A(n) := A(n-1) + A(n-3) + A(n-4)

with initial values A(0), A(1), A(2), A(3) := 1, 1, 1, 2.

For n >= 0, this is the number of integer compositions of n into parts from the set {1, 3, 4}. This is the sequence that we ultimately wish to compute.

For convenience, define a sequence T such that:

T(n) := T(n-1) + T(n-3) + T(n-4)

with initial values T(0), T(1), T(2), T(3) := 0, 0, 0, 1.

Note that A(n) and T(n) are simply shifts of each other. More precisely, A(n) = T(n+3) for all integers n. Accordingly, as elaborated by another answer, the companion matrix for both sequences is:

[0  1  0  0]
[0  0  1  0]
[0  0  0  1]
[1  1  0  1]

Call this matrix C, and let:

a, b, c, d := T(n), T(n+1), T(n+2), T(n+3)

a', b', c', d' := T(2n), T(2n+1), T(2n+2), T(2n+3)

By induction, it can easily be shown that:

[0  1  0  0]^n = [d-c-a  c-b  b-a  a]
[0  0  1  0]     [  a    d-c  c-b  b]
[0  0  0  1]     [  b    b+a  d-c  c]
[1  1  0  1]     [  c    c+b  b+a  d]

As seen above, for any n, C^n can be fully determined from its rightmost column alone. Furthermore, multiplying C^n with its rightmost column produces the rightmost column of C^(2n):

[d-c-a  c-b  b-a  a][a] = [a'] = [a(2d - 2c - a) + b(2c - b)]
[  a    d-c  c-b  b][b]   [b']   [     a^2 + c^2 + 2b(d - c)]
[  b    b+a  d-c  c][c]   [c']   [     b(2a + b) + c(2d - c)]
[  c    c+b  b+a  d][d]   [d']   [     b^2 + d^2 + 2c(a + b)]

Thus, if we wish to compute C^n for some n by repeated squaring, we need only perform matrix-vector multiplication per step instead of the full matrix-matrix multiplication.

Now, the implementation, in Python:

# O(n) integer additions or subtractions
def A_linearly(n): 
    a, b, c, d = 0, 0, 0, 1 # T(0), T(1), T(2), T(3)

    if n >= 0:
        for _ in range(+n):
            a, b, c, d = b, c, d, a + b + d
    else: # n < 0
        for _ in range(-n):
            a, b, c, d = d - c - a, a, b, c

    return d # because A(n) = T(n+3)

# O(log n) integer multiplications, additions, subtractions.
def A_by_doubling(n):
    n += 3 # because A(n) = T(n+3)

    if n >= 0:
        a, b, c, d = 0, 0, 0, 1 # T(0), T(1), T(2), T(3)
    else: # n < 0
        a, b, c, d = 1, 0, 0, 0 # T(-1), T(0), T(1), T(2)

    # Unroll the final iteration to avoid computing extraneous values
    for i in reversed(range(1, abs(n).bit_length())):
        w = a*(2*(d - c) - a) + b*(2*c - b)
        x = a*a + c*c + 2*b*(d - c)
        y = b*(2*a + b) + c*(2*d - c)
        z = b*b + d*d + 2*c*(a + b)

        if (n >> i) & 1 == 0:
            a, b, c, d = w, x, y, z
        else: # (n >> i) & 1 == 1
            a, b, c, d = x, y, z, w + x + z

    if n & 1 == 0:
        return a*(2*(d - c) - a) + b*(2*c - b) # w
    else: # n & 1 == 1
        return a*a + c*c + 2*b*(d - c)         # x


print(all(A_linearly(n) == A_by_doubling(n) for n in range(-1000, 1001)))

Because it was rather trivial to code, the sequence is extended to negative n in the usual way. Also provided is a simple linear implementation to serve as a point of reference.

For n large enough, the logarithmic implementation above is 10-20x faster than directly exponentiating the companion matrix with numpy, by a simple (i.e. not rigorous, and likely flawed) timing comparison. And by my estimate, it would still take ~100 years to compute A(10**12)! Even though the algorithm above has room for improvement, that number is simply too large. On the other hand, computing A(10**12) mod M for some M is much more attainable.

A direct relation to Lucas and Fibonacci numbers

It turns out that T(n) is even closer to the Fibonacci and Lucas numbers than it is to Tetranacci. To see this, note that the characteristic polynomial for T(n) is x^4 - x^3 - x - 1 = 0 which factors into (x^2 - x - 1)(x^2 + 1) = 0. The first factor is the characteristic polynomial for Fibonacci & Lucas! The 4 roots of (x^2 - x - 1)(x^2 + 1) = 0 are the two Fibonacci roots, phi and psi = 1 - phi, and i and -i--the two square roots of -1.

The closed-form expression or "Binet" formula for T(n) will have the general form:

T(n) = U(n) + V(n)
U(n) = p*(phi^n) + q*(psi^n)
V(n) = r*(i^n) + s*(-i)^n

for some constant coefficients p, q, r, s.

Using the initial values for T(n), solving for the coefficients, applying some algebra, and noting that the Lucas numbers have the closed-form expression: L(n) = phi^n + psi^n, we can derive the following relations:

       L(n+1) - L(n)    L(n-1)   F(n) + F(n-2)
U(n) = ------------- = -------- = ------------
             5            5           5

where L(n) is the n'th Lucas number with L(0), L(1) := 2, 1 and F(n) is the n'th Fibonacci number with F(0), F(1) := 0, 1. And we also have:

V(n) =  1 / 5   if n = 0 (mod 4)
     | -2 / 5   if n = 1 (mod 4)
     | -1 / 5   if n = 2 (mod 4)
     |  2 / 5   if n = 3 (mod 4)

Which is ugly, but trivial to code. Note that the numerator of V(n) can also be succinctly expressed as cos(n*pi/2) - 2sin(n*pi/2) or (3-(-1)^n) / 2 * (-1)^(n(n+1)/2), but we use the piece-wise definition for clarity.

Here's an even nicer, more direct identity:

T(n) + T(n+2) = F(n)

Essentially, we can compute T(n) (and therefore A(n)) by using Fibonacci & Lucas numbers. Theoretically, this should be much more efficient than the Tetranacci-like approach.

It is known that the Lucas numbers can computed more efficiently than Fibonacci, therefore we will compute A(n) from the Lucas numbers. The most efficient, simple Lucas number algorithm I know of is one by L.F. Johnson (see his 2010 paper: Middle and Ripple, fast simple O(lg n) algorithms for Lucas Numbers). Once we have a Lucas algorithm, we use the identity: T(n) = L(n - 1) / 5 + V(n) to compute A(n).

# O(log n) integer multiplications, additions, subtractions
def A_by_lucas(n):
    n += 3 # because A(n) = T(n+3)
    offset = (+1, -2, -1, +2)[n % 4]
    L = lf_johnson_2010_middle(n - 1)
    return (L + offset) // 5

def lf_johnson_2010_middle(n):
    "-> n'th Lucas number. See [L.F. Johnson 2010a]."
    #: The following Lucas identities are used:
    #:
    #:      L(2n)   = L(n)^2 - 2*(-1)^n
    #:      L(2n+1) = L(2n+2) - L(2n)
    #:      L(2n+2) = L(n+1)^2 - 2*(-1)^(n+1)
    #:
    #: The first and last identities are equivalent.
    #: For the unrolled iteration, the following is also used:
    #:
    #:      L(2n+1) = L(n)*L(n+1) - (-1)^n
    #:
    #: Since this approach uses only square multiplications per loop,
    #: It turns out to be slightly faster than standard Lucas doubling,
    #: which uses 1 square and 1 regular multiplication.
    if n >= 0:
        a, b, sign = 2, 1, +1  # L(0), L(1), (-1)^0
    else: # n < 0
        a, b, sign = -1, 2, -1 # L(-1), L(0), (-1)^(-1)

    # unroll the last iteration to avoid computing unnecessary values
    for i in reversed(range(1, abs(n).bit_length())):
        a = a*a - 2*sign # L(2k)
        c = b*b + 2*sign # L(2k+2)
        b = c - a        # L(2k+1)
        sign = +1

        if (n >> i) & 1:
            a, b = b, c
            sign = -1

    if n & 1:
        return a*b - sign
    else:
        return a*a - 2*sign

You may verify that A_by_lucas produces the same results as the previous A_by_doubling function, but is roughly 5x faster. Still not fast enough to compute A(10**12) in any reasonable amount of time!

Dmitry Bychenko · Answer 3 · 2016-12-13T08:02:15.003

You can easily improve your current recursion implementation by adding memoization which makes the solution fast again. C# code:

// Dictionary to store computed values
private static Dictionary<int, long> s_Solutions = new Dictionary<int, long>();

private static long Count134(int value) {
  if (value == 0)
    return 1;
  else if (value <= 0)
    return 0;

  long result;

  // Improvement: Do we have the value computed? 
  if (s_Solutions.TryGetValue(value, out result))
    return result;

  result = Count134(value - 4) + 
           Count134(value - 3) + 
           Count134(value - 1);

  // Improvement: Store the value computed for future use
  s_Solutions.Add(value, result);

  return result;
}

And so you can easily call

Console.Write(Count134(500));

The outcome (which takes about 2 milliseconds) is

3350159379832610737

Using matrices to find the number of different ways to write n as the sum of 1, 3, and 4?

3 Answers3

A direct relation to Lucas and Fibonacci numbers