2

I have an array of certain length (say 8) 1,2,2,3,3,4,5,6

I wish to find number of subsequences of this array of length (say 4). This is 8 choose 4 (8C4) = 70.

However, in the list of 70 subsequences above, I do not want to count sequences having duplicate elements e.g. 1223 1224 1334 2233 should be removed.

I have a solution for a single element being duplicated (i.e. array is 1,2,2,3,4,5,6,7) Here the duplicates are 2C2*6C2 = 15 and answer required is 55

Is there a general formula/algorithm for moderately large array size and len (max hundred thousand). For the general case, the answer is calculated in modulo format.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
Ranon
  • 99
  • 4
  • @DavidEisenstat This is not the same problem as the one you linked to because their definitions of "distinct" differ. This counts the number of ways to *make& a subsequence without duplicates. That one counted the number of different subsequences that you can make out of a string, with duplicates. – btilly Sep 13 '19 at 15:31

2 Answers2

2

This is a straightforward dynamic programming problem.

Keep an array of the current number of ways to have currently chosen i of the elements in your subsequence. Go through the values and counts, and if c is the count of how many of a particular value there is, then the number of ways of winding up with i is the ways to do it without this value, plus the ways to have done i-1 already times the number of ways to choose this value.

Here is working Python code:

def count_distinct_subseq (seq, k, modulo=None):
    count = {}
    for s in seq:
        if s in count:
            count[s] = count[s] + 1
        else:
            count[s] = 1

    # # of ways to currently have i chosen.
    ways = [0 for i in range(k+1)]
    ways[0] = 1

    for s, c in count.iteritems():
        for i in range(k, 0, -1):
            ways[i] = ways[i] + ways[i-1] * c
            if modulo is not None:
                ways[i] = ways[i] % modulo

    return ways[k]

print(count_distinct_subseq([1, 2, 2, 3, 4, 5, 6, 7], 4))

Pass a third parameter to do your calculations modulo a smaller one.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120
btilly
  • 43,296
  • 3
  • 59
  • 88
  • @DavidEisenstat Look at the example described in the question. It is worded confusingly, but 55 is the number of ways to get a subsequence that is distinct, not the number of distinct subsequences that you can get. Similarly in your example, you get a subsequence that is distinct using indexes `(0, 3), (1, 3), (2, 3)` so the answer 3 is in fact the number of ways you get a distinct subsequence, even though there is only one subsequence that you can get out of it. – btilly Sep 13 '19 at 20:29
  • I also pointed this out in my comment above about why this is not actually a duplicate of the post that you thought it was a duplicate of. – btilly Sep 13 '19 at 20:30
1

Here's the recurrence btilly discussed, coded in JavaScript:

function f(A, k, i=A.length-1){
  if (k > i + 1 || k == 0)
    return [0, []]
    
  if (i == 0)
    return [1, [[[A[i]], 1]]]
    
  const [totalK, combsK] = f(A, k, i - 1)
  const [total, combs] = f(A, k - 1, i - 1)
   
  let newCombs = []
  let count = totalK
  
  for ([s, r] of combs){
    if (!s.includes(A[i])){
      let _s = s.slice()
      _s.push(A[i])
      count += r
      newCombs.push([_s, r])
    }
  }
  if (!newCombs.length && k == 1){
    newCombs = [[[A[i]], 1]]
    count += 1
  }
  return [count, combsK.concat(newCombs)]
}

var A = ["a", "a", "a", "b"]
console.log(JSON.stringify(f(A, 2)))

A = [1, 2, 2, 3, 4, 5, 6, 7]
console.log(JSON.stringify(f(A, 4)))
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61