14

How to calculate the number of distinct sub sequences of length 3 (or in general of length k < n) in an array of length n?

Note: Two sub sequences are considered different if the order of elements in them are different.

Ex: Suppose the array A = [1, 2, 1, 1], then the answer should be 3 because there are only three distinct subsequences of length 3 as shown below:

[1, 1, 1]
[1, 2, 1]
[2, 1, 1]

Size of the array n <= 10^5, each element in the array A_i <= n.

My approach:

I figured the brute force approach, i.e., to take tuples of length 3 and insert it into a map. But this is neither space/time efficient.

Edit: This was an interview question and it said that for k = 3 the expected time and space complexity is O(n).

dodobhoot
  • 493
  • 6
  • 15
  • 2
    Shouldn't you have 1, 1, 2 as part of the solution as well? – kishore Jun 05 '18 at 14:57
  • 1
    @kishore By definition, elements of the subsequence of an array needs to occur in the same order as the original array. Hence, we cannot have `1, 1, 2`. – dodobhoot Jun 05 '18 at 15:04

2 Answers2

11

As is often the case with interview problems, there's a dynamic programming solution. Let T(m, k) be the number of distinct length-k subsequences of the first m elements. Then assuming one-based indexing on the input A, we have a 2D recurrence

T(m, 0) = 1
T(m, k) = T(m-1, k) + T(m-1, k-1) -
          ^^^^^^^^^   ^^^^^^^^^^^
     A_m not chosen   A_m chosen

            { T(i-1, k-1), if i < m is the maximum index where A_i = A_m
            { 0,           if no such index exists

The subtracted term ensures that we don't count duplicates; see https://stackoverflow.com/a/5152203/2144669 for more explanation.

The running time (with a hash map to maintain the rightmost occurrence so far of each symbol seen) is O(k n), which is O(n) for k = 3.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
4

Here's a slightly different take. We can think of the number of ways an element, m, can be kth in the subsequence as the sum of all the ways the previous occurence of any element (including m) can be (k-1)th. As we move right, however, the only update needed is for m; the other sums stay constant.

For example,

// We want to avoid counting [1,1,1], [1,2,1], etc. twice
[1, 2, 1, 1, 1]

(display the array vertically for convenience)

            <-  k  ->
[1,  ->  1: [1, 0, 0]
 2,  ->  2: [1, 1, 0]
 1,  ->  1: [1, 2, 1]
 1,  ->  1: [1, 2, 3]
 1]  ->  1: [1, 2, 3]

Now if we added another element, say 3,

...
 3]  ->  3: [1, 2, 3]

 // 1 means there is one way
 // the element, 3, can be first

 // 2 means there are 2 ways
 // 3 can be second: sum distinct
 // column k[0] = 1 + 1 = 2

 // 3 means there are 3 ways
 // 3 can be third: sum distinct
 // column k[1] = 2 + 1 = 3

Sum distinct k[2] column:

0 + 3 + 3 = 6 subsequences

[1,2,1], [2,1,1], [1,1,1]
[1,1,3], [2,1,3], [3,2,1]

The sum-distinct for each column can be updated in O(1) per iteration. The k sums for the current element (we update a single list of those for each element), take O(k), which in our case is O(1).

JavaScript code:

function f(A, k){
  A.unshift(null);
  
  let sumDistinct = new Array(k + 1).fill(0);
  let hash = {};

  sumDistinct[0] = 1;

  for (let i=1; i<A.length; i++){
    let newElement;
    
    if (!hash[A[i]]){
      hash[A[i]] = new Array(k + 1).fill(0);
      newElement = true;
    }
    
    let prev = hash[A[i]].slice();

    // The number of ways an element, m, can be k'th
    // in the subsequence is the sum of all the ways
    // the previous occurence of any element
    // (including m) can be (k-1)'th
    for (let j=1; j<=k && j<=i; j++)
      hash[A[i]][j] = sumDistinct[j - 1];

    for (let j=2; j<=k && j<=i; j++)
      sumDistinct[j] = sumDistinct[j] - prev[j] + hash[A[i]][j];

    if (newElement)
      sumDistinct[1] += 1;

    console.log(JSON.stringify([A[i], hash[A[i]], sumDistinct]))
  }

  return sumDistinct[k];
}

var arr = [1, 2, 1, 1, 1, 3, 2, 1];

console.log(f(arr, 3));
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61