9

I'm trying to come up with a fast algorithm to compute the quantity b[i]= med |y_i+y_j|, 1<=j!=i<=n when the y_1,...,y_n are sorted already (so b[] is a vector of same length as y[]). I will assume that all elements of y[] are unique and that n is even.

So, the code below computes the b[i]'s the naive (O(n**2)) way: (I wrote this in R for convenience, but I'm language agnostic)

n<-30
a_fast<-b_slow<-rep(NA,n)
y<-sort(rnorm(n,100,1))
z<-y
for(i in 1:n){
    b_slow[i]<-median(abs(y[-i]+y[i]))
}   

I have a tentative proposal --below-- for doing it in O(n). But it only works if y[] contains positive numbers.

My question is: how should I change the fast algorithm to also work when y[] contains both positive and negative numbers? Is this even possible?

EDIT:

And the code below the (tentative) O(n) way (I wrote this in R for convenience, but I'm language agnostic)

tryA<-floor(1+(n-1)/2+1)
tryB<-floor(1+(n-1)/2)
medA<-y[tryA]
medB<-y[tryB]
for(i in 1:(tryA-1)){
        a_fast[i]<-medA+y[i]
}
for(i in tryA:n){
        a_fast[i]<-medB+y[i]
}

Simple example:

Simple, illustrative example. If we have a vector of length 4

-3, -1, 2, 4

Then, for example for i=1, the 3 absolute pairwise sums are

  4 1 1

and their median is 1.

Then, for example for i=2, the 3 absolute pairwise sums are

  4 1 3

and their median is 3.

Here is a longer example with both positive and negative y[]:

 -1.27 -0.69 -0.56 -0.45 -0.23  0.07  0.13  0.46  1.56  1.72

and here are my new b_slow[] (this is the ground thruth, computed the naive way):

1.20 0.92 1.00 1.01 0.79 0.53 0.56 0.53 1.33 1.49

but now, my new a_fast[] don't match no more:

 -1.20 -0.62 -0.49 -0.38 -0.16 -0.16 -0.10  0.23  1.33  1.49

EDIT:

Here is my implementation of Francis's solution (up to the point where we have two sorted array, the median of which is easy to compute). I did it in R to stay in the spirit of the question.

Nonetheless, I seem to be missing a correction factor for the index (the ww in the code below) so the code below is sometimes off by a little bit. This is because in the definition above we compute the medians over n-1 observations (i!=j).

 n<-100
 y<-rnorm(n)
 y<-sort(y)

 b<-rep(NA,n)
 #Naive --O(n**2)-- approch:
 for(i in 1:n){
     b[i]<-median(abs(y[-i]+y[i]))
 }

 k<-rep(NA,n)
 i<-1
 k[i]<-min(na.omit(c(which(y+y[i]>0)[1],n))) #binary search: O(log(n)) -- 
 for(i in 2:n){                  #O(n)
     k_prov<-k[i-1]
     while(y[k_prov]+y[i]>0 && k_prov>0)     k_prov<-k_prov-1
     k[i]<-max(k_prov+1,1)
     #for(i in 1:n){ should give the same result.
     #   k[i]<-which(y+y[i]>0)[1]
     #}
 }

 i<-sample(1:n,1)
 x1<--y[1:(k[i]-1)]-y[i]
 x2<-y[i]+y[n:k[i]]
 x3<-c(x1,x2)
 plot(x3)
 ww<-ifelse(i<k[i] & i>n/2,n/2+1,n/2)
 sort(x3)[ww]  #this can be computed efficiently: O(log(n))
 b[i]          #this is the O(n**2) result.
user189035
  • 5,589
  • 13
  • 52
  • 112
  • If they are already sorted, shouldn't the median be the sum of those at the middle of the list? The only question would seem to be what to do if there is an odd number of elements. – wallyk May 15 '14 at 16:46
  • @wallyk The sum divided by 2. And yeah, that's what I was thinking too. In case of odd elements, just peek element `n/2`. – Filipe Gonçalves May 15 '14 at 16:47
  • 2
    @user189035 The median of a sorted array is the middle element in case of an odd number of elements; and the average of the two middle elements otherwise. If the array is sorted, you get it in `O(1)`. Is there a specific reason why you're not doing this? Or I didn't get the problem? – Filipe Gonçalves May 15 '14 at 16:49
  • This is due to the `abs()` : there is no `abs()` in your O(n) method...So it only works for positive number. I won't say that adding a `abs()` would be a solution... – francis May 15 '14 at 17:32
  • There are two `abs()`, including one here : `b_slow[i]<-median(abs(y[-i]+y[i]))`. Does it give the same result in case of negative number if you remove this `abs()` ? I would expect so. – francis May 15 '14 at 17:39
  • 1
    It's suspicious that your algorithm doesn't work with negative numbers, because you should be able to define `y'_i = y_i+y_1`, compute the median of `y'_i`, and then subtract `y_1` from the result. – Dietrich Epp May 15 '14 at 18:01
  • 1
    @DietrichEpp: I think I don't really understand you comment. First, I would think that you should do y'_i=y_i-y_1 (to make the y's positive). Next, I think you should then add +2*y_1 to compensate (since the y_1 appears twice in y_i+y_j). Finally, if you try it doesn't work (transforming the y before computing the fast version then transforming back doesn't give the same numbers as the slow version computed on the untransformed y) – – user189035 May 15 '14 at 18:22
  • 1
    Okay, I didn't see the absolute value in there. I'm adding it to the title of the question. – Dietrich Epp May 15 '14 at 18:29

1 Answers1

4

Here is a O(Nxln(N)xln(N)) solution :

for all i :

1) find item k such as j<k <=> y[j]+y[i]<0 (dichotomy, O(ln(N)))

k separates two sorted lists : one above -y[i], the other below -y[i], for which the sign should be changed to get abs(y[i]+y[j]). Now, we are looking for the median of these lists.

From here, it is just the problem of finding the median of two sorted lists, repeated n times.

2)Let's pick the maximum (M=abs(y[1]-y[i]) or M=abs(y[size]-y[i])) and minimum (m around k) of these lists and restart a dichotomy (O(ln(N)). Let's start by picking the middle (M+m)/2...at any stage, let pick the middle...

3)Stage of this big dichotomy : How many items y[j]+y[i] are above (M+m)/2 in the first list ? Once again a dichotomy... O(ln(N)). How many items -y[j]-y[i] are above (M+m)/2 in the second list ? Guess what ? Dichotomy... Sum these two numbers. If it is above (size-1)/2, m=(M+m)/2. Otherwise M=(M+m)/2.

4)If m=M stop ! b[i]=m;

I guess somebody will come with a better solution...

Edit : I should thank @user189035 for his link to an O(ln(n+m)) algorithm to compute the median of two sorted lists. How to find the kth smallest element in the union of two sorted arrays?

Bye,

Community
  • 1
  • 1
francis
  • 9,525
  • 2
  • 25
  • 41