Firstly, the complexity of the naive algorithm is O(k(n-k+1)) (usually this is approximated to O(k.n)), not O(n^2). That's where, for each consecutive subarray (of n-k+1 possible), you must perform k comparisons.
You can do better than this with some memoization, using an additional array of length k, which we can call maximums
. That array will store the index of the next maximum.
For every iteration through your data set, you examine the first element of maximums
. You remove any "expired" indices, and now the first element is your answer for the current iteration.
As you slide a window (size k) across your data, you push the current index onto maximums
, and then prune it as follows: the value at index maximums[i]
must be less than the value at index maximums[i-1]
. If it is not, then you continue to bubble the index towards the beginning of maximums
, one spot at a time until this becomes true.
In reality, it's best to treat the maximums
array as a ring buffer. The pruning process will shrink the tail back toward the head, while popping any "expired" maximums (when the window slides past them) will advance the head one step.
It's a little clunky, but here's some working code to illustrate:
#include <vector>
#include <iostream>
int main()
{
const int window_size = 4;
std::vector<int> vals = { 3, 7, 20, 6, 12, 2, 0, 99, 5, 16 };
std::vector<int> maximums( window_size );
int mhead = 0, mtail = 0;
for( int i = 1; i < vals.size(); i ++ )
{
// Clean out expired maximum.
if( maximums[mhead] + window_size <= i )
{
int next_mhead = (mhead + 1) % window_size;
if( mtail == mhead ) mtail = next_mhead;
mhead = next_mhead;
}
if( vals[i] >= vals[ maximums[mtail] ] )
{
// Replace and bubble up a new maximum value.
maximums[mtail] = i;
while( mhead != mtail && vals[ maximums[mtail] ] >= vals[ maximums[(mtail+window_size-1)%window_size] ] )
{
int prev_mtail = (mtail + window_size - 1) % window_size;
maximums[prev_mtail] = maximums[mtail];
mtail = prev_mtail;
}
}
else
{
// Add a new non-maximum.
mtail = (mtail + 1) % window_size;
maximums[mtail] = i;
}
// Output current maximum.
if( i >= window_size - 1 )
{
std::cout << vals[ maximums[mhead] ] << " ";
}
}
std::cout << std::endl;
return 0;
}
Now, the time complexity...
Best case is O(n), which happens if all your data is sorted (either ascending or descending).
Worst case, I believe, is O(2n). The only way to require k extra operations in one iteration is if you have already had k steps of linear complexity (so that the ring buffer is full). And in such a case, the ring buffer will be empty for the next step. Since we can fill and empty the ring buffer only n/k times, those occasional k operations come out at k.n/k, or just n.
You ought to be able to show that even constant partial emptying of the ring buffer will result in the same complexity.
And finally, we can wrap up and call the whole thing O(n), since any constant factor becomes insignificant for large n. It actually came out better than I expected. =)