Find the maximum distance between two elements that don't differ by more than k

Question

Given an array arr, find the maximum abs(i-j) such that abs(arr[i] - arr[j]) <= k.

After a lot of thought, I came up with the following algorithm,

1) Create a new array of pair<arr[i], i> (say arrayIndexPairs)
2) Sort this arrayIndexPairs based on the array value (first of pair).
3) Build a segment tree on the index (second of pair) with the arrayIndexPairs so that we can answer range max queries
4) for i <- 0 to n-1
    4.1) rightIndex = Binary search the array values (first of pair) for ceil(arrayIndexPairs[i].first) in the interval [i+1, n-1]
    4.2) int maxIndex = rangeQueryForMax(i+1, rightIndex)
    4.3) result = max(result, maxIndex - i);
return result

The complexity is O(n log n) for the sort + for every element we do a binary search O(log n) + rangeQuery, O(log n). The overall time complexity is O(nlogn + n*2*logn) which is asymptotically O(nlogn).

Is the approach correct? Is is possible to formulate a linear time solution? I tried using hashmaps but find it very hard to arrive at a linear solution.

I've seen this problem before. A classic job interview problem at companies like Google. — IceArdor, Sep 21 '16 at 07:00

Martin Gottweis · Answer 1 · 2016-09-21T07:59:11.233

1

I came up with this:

def find_max_abs(l, k):
    for lenght_of_interval in range(len(l), 1, -1):
        for start_of_interval in range(0, len(l) - lenght_of_interval + 1):
            if abs(l[start_of_interval] - l[start_of_interval + lenght_of_interval - 1]) <= k:
                return lenght_of_interval - 1

Should work nicely, but it's not linear(worst case N²). I am interested if a linear algorithm exists

edited Sep 21 '16 at 07:59

answered Sep 21 '16 at 06:59

Martin Gottweis

2,721
13
27

I'm not sure if this will work. We are looking for two elements in the array that are farthest apart. Sorting would destroy information on the original indexing. Also return l[-1] - l[0] is the max difference, we are looking for the distance! – harun_a Sep 21 '16 at 07:27

score 1 · Accepted Answer · answered Sep 21 '16 at 07:15

1

For the general case, your idea seems efficient.

For the case where the elements are all integers, you can do it in Θ(n k) expected time. If k = o(log(n)), this is a saving. If k is a constant, this is linear in n.

Place all your elements in a hash table mapping each element e to its position in the array i (if there is more than a single e, let each entry you place in the hash table overwrite the previous one - it doesn't matter).
For each element e at position i, and d = -k, -(k - 1), ... 0, 1, ... k, check if e + d is in the hash table. If so, you have the position of e + d, say j, from the hash table.
Retain the positions of the maximal distance you found in 2.

answered Sep 21 '16 at 07:15

Ami Tavory

74,578
11
141
185

Isn't theta(nk) a pseudo polynomial solution? In that case, O(n^2) works better? Also, if k is a constant but it's huge (of the order of billions), again won't O(n^2) be better? – harun_a Sep 21 '16 at 07:23
That's why I specified the condition that *k = o(log(n))* (note the use of the small o - I even linked to the definition). If that's the case, then this will be more efficient by definition. If, say, k=3, then it would start being more efficient from some value of n, and if k=1B, then it would start being more efficient from some larger value of n. I also stated (in the first sentence) that, for the general case, your original solution is the efficient one. – Ami Tavory Sep 21 '16 at 07:28
Ah, I misread the k = o(log(n)) part! Thanks! A llinear solution is not possible? – harun_a Sep 21 '16 at 07:29
1

Saying that a linear solution is not possible - is something that requires proof. I only know that, for the general case, I couldn't find one (that's a much weaker statement). Proofs of negative results are usually more abstract (e.g., the n logn lower bound on comparison based sorting). – Ami Tavory Sep 21 '16 at 07:30

jurhas · Answer 3 · 2016-09-22T11:07:05.537

It seems a bit forced the definition of "linear". I would approach in a different way. We can notice that the function distance d is searching for the max. So we know that there is the following combinations: - DIST COUPLES COUNT

d=n-1 (a[0],a[n-1]) 1
d=n-2 (a[0],a[n-2]), (a[1],a[n-1]) 2
...
d=1 (a[0],a[1])...(a[n-2],a[n-1]) n-1

Since we search the maximum we are going to investigate first by the maximal distances . So we have best case O(1), worst case the sum from 1 to n-1 =(n-1)*(n/2) =O(n2). Average I would expect better performances, since it is can be implemented very efficently.
Here the C implementation:

#include "stdio.h"
#include "stdlib.h"
#define ARRAYSIZE 10000

int find_dist(int * result, const int *array, int n,int k )
{
  int i,j,ti;
  for (i=n-1;i>0;i--)
  { 
     ti=i;       
     for (j=0;ti< n ; j++,ti++)
        if (abs(array[j]-array[ti])<=abs(k))
            {
                result[0]=j;
                result[1]=ti;
                return 1;
            }
   }
return 0;
}

int main()
{
  int array[ARRAYSIZE],result[2],i;
  for (i=0;i<ARRAYSIZE;i++)
    {
        array[i]=rand()%1000;
        //printf("%d ",array[i]);
    }
  if (find_dist(result,array,ARRAYSIZE,3))
    printf ("\n%d %d\n",result[0],result[1]);
  else
    printf ("No items match requirement\n");
 }

Find the maximum distance between two elements that don't differ by more than k

3 Answers3