5

I am trying to derive the average case running time for deterministic linear search algorithm. The algorithm searches an element x in an unsorted array A in the order A[1], A[2], A[3]...A[n]. It stops when it finds the element x or proceeds until it reaches the end of the array. I searched on wikipedia and the answer given was (n+1)/(k+1) where k is the number of times x is present in the array. I approached in another way and am getting a different answer. Can anyone please give me the correct proof and also let me know whats wrong with my method?

E(T)= 1*P(1) + 2*P(2) + 3*P(3) ....+ n*P(n) where P(i) is the probability that 
                   the algorithm runs for 'i' time (i.e. compares 'i' elements).
P(i)= (n-i)C(k-1) * (n-k)! / n! 
Here, (n-i)C(k-1) is (n-i) Choose (k-1). As the algorithm has reached the ith 
step, the rest of k-1 x's must be in the last n-i elements. Hence (n-i)C(k-i).
(n-k)! is the total number of ways of arranging the rest non x numbers, and n! 
is the total number of ways of arranging the n elements in the array.

I am not getting (n+1)/(k+1) on simplifying.

leonheess
  • 16,068
  • 14
  • 77
  • 112
Brahadeesh
  • 2,245
  • 8
  • 40
  • 58
  • Maybe I'm a moron but if the array is size N then isn't average case time N/2 ?? Nevermind ...shouldn't be commenting from my iPhone...misread the q – Kevin Feb 26 '11 at 07:50
  • 1
    @kevin Its case when there are not duplicates. But when you have duplicate, even when you are finding second occurrence (to calcuate avg complexity) you will get first in your search. – Zimbabao Feb 26 '11 at 08:23
  • 1
    @Kevin actually the average case for no multiple copy case is (n+1)/2. It can be obtained like this: 1*(1/n)+ 2*(1/n)+3*(1/n)...+n*(1/n). – Brahadeesh Feb 26 '11 at 14:50

2 Answers2

6

You've forgotten to account for the permutations of the k copies of x. The correct definition of P(i) is

P(i) = (n-i)C(k-1) * k! * (n-k)! / n! = (n-i)C(k-1) / nCk.
                     ^^

I'll turn things over to Mathematica:

In[1]:= FullSimplify[Sum[i Binomial[n-i, k-1]/Binomial[n, k], {i, 1, n}], 0 <= k <= n]

        1 + n
Out[1]= -----
        1 + k

To elaborate on my comment below: assume that all elements are distinct, let X be the set of matches, and let Y be the set of non-matches. By assumption, |X| = k and |Y| = n-k. The expected number of reads is equal to the sum over elements e of the probability that e is read.

Given a set of elements S, each element in S comes before all of the others with probability 1/|S|.

An element x in X is read if and only if it comes before every other element of X, which is probability 1/k. The total contribution of elements in X is |X| (1/k) = 1.

An element y in Y is read if and only if it comes before every element of X, which is probability 1/(k+1). The total contribution of elements in Y is |Y| (1/(k+1)) = (n-k)/(k+1).

We have 1 + (n-k)/(k+1) = (n+1)/(k+1).

user635541
  • 1,214
  • 6
  • 7
  • Note: an easier way to derive this value is to observe that exactly 1 of the x's is examined, and each of the n-k non-x's is examined iff it appears before all of the x's, which is probability 1/(k+1). We have 1 + (n-k)/(k+1) = (n+1)/(k+1). – user635541 Feb 26 '11 at 14:08
  • @user635541 Thank you for the result. I did think of permuting the x's. But then they give rise to mutiple copies of the same arrays. Hence, I decided against them. Could you please clarify the justification in using k! ? Also, could you elaborate the comment? I did not get it. Sorry. – Brahadeesh Feb 26 '11 at 14:48
  • 1
    The problem is that the n! factor also gives rise to k! copies of the same arrays (assuming that the non-x's are distinct). – user635541 Feb 26 '11 at 15:14
  • @Brahadeesh What about the elements which come before i, should not we permute them also? – V K Jul 17 '20 at 04:18
  • I am not able to solve this binomial summation, can you suggest any way to simplify this? – V K Jul 17 '20 at 04:50
  • Could you explain how do you get "An element y in Y is read if and only if it comes before every element of X, which is probability 1/(k+1)" 1/(k + 1) as the probability for reading an element in Y? – DDG Jun 20 '23 at 23:06
6

Here is a solution that uses Cormen terms: Let S be the set of the other n-k elements.
Let the indicator random variable Xi=1, if we encounter the i'th element of the set S in the course of our execution.
Pr{Xi=1}=1/(k+1) and therefore E[Xi]=1/(k+1).
Let the indicator random variable Y=1, if we encounter any of the k elements that we are searching for in the course of our execution.
Pr{Y=1}=1 and therefore E[Y]=1.
Let the random variable X=Y+X1+X2+...X(n-k) be the sum of the elements that we encounter in the course of our execution.
E[X]=E[Y+X1+X2+...X(n-k)]=E[Y]+E[X1]+E[X2]+...E[X(n-k)]=1+(n-k)/(k+1)=(n+1)/(k+1).

Avi Cohen
  • 3,102
  • 2
  • 25
  • 26
  • I personally found the explanation for Y a little confusing. In case it is helpful to anyone else, another way to think of this is to let X = (number of elements visited that dont match) and Y = (number of elements visited that do match). Then, we are looking for Z = (number of elements visited) = X + Y but Y=1 with probability 1 so E[X] = E[X] + E[Y] = E[X] + 1 – Richard Fung Apr 02 '16 at 22:41