2

Given an Array A of N non negative integers, find an integer k, such that the sum of difference of each element with k is minimum.
i.e. Summation(abs(A[i] - k)), 1 <= i <= N, or simply
|A[1] - k| + |A[2] - k| + |A[3] - k| + ... + |A[N] - k|

My Approach:

low = minimumValueInArray(A);
high= maximumValueInArray(A);
ans = Infinite;
for (k = low; k <= high; k++) {
    temp = 0;
    for (i = 1; i <= N; i++) {
        temp += abs(A[i] - K);
    }
    ans = min(ans, temp);
}
return ans;

This is simply brute force approach trying to solve for all values of K.Can we optimise it.
What is the smarter way of doing this?

Reference: This was my logic behind this Codejam Round 1B problem

vaibhavatul47
  • 2,766
  • 4
  • 29
  • 42
  • No, sort the list of numbers and take the element in the middle or (if number of elements is even) the midpoint of the two elements in the middle as k. Also called median. – Lutz Lehmann May 04 '14 at 11:33
  • @LutzL: yes that would be fast and optimum too, but this is also correct. – vaibhavatul47 May 05 '14 at 18:50

3 Answers3

7

I get essentially the median. To minimize the cost function, take the derivative.

enter image description here

To find the minimum of the cost function, find where the derivative is zero. The cost function is not continuously differentiable everywhere, but it is piecewise differentiable. Let S = the sorted numbers of A and si be the ith largest number.

Case 1. If m is odd

There will be floor(m/2) ai's less than the median and floor(m/2) ai's greater than the median. Selecting x = the median, gives -m/2 + 0 + m/2 = 0.

enter image description here

Case 2. If m is even

The derivative will be zero for values in between si and s(i+1); i = m/2. Then one can select any number k such that

enter image description here

Simple example 1 2 4 50

picking 2: 1 + 0 + 2 + 48 = 51

picking 4: 3 + 2 + 0 + 46 = 51

picking 3: 2 + 1 + 1 + 47 = 51

So arbitrarily pick s(m/2)

For a better algorithm, O(NlogN) you can sort the numbers and then pick (m+1)/2 or m/2 as appropriate from above or you can use the kth largest element which can compute the answer in O(N).

Community
  • 1
  • 1
waTeim
  • 9,095
  • 2
  • 37
  • 40
1

Given a set of numbers A_1,...,A_N, it is known (see e.g. http://homepages.gac.edu/~holte/courses/mcs256/problems/median.pdf ) that the median minimizes the sum of absolute deviation.

Since you have an array of integers, either:

(a) N is odd and the median is an integer m, which you set k as.

(b) N is even, and the median is the average of two integers, a and b. In this case it turns out that all numbers between a and b minimize the sum of the absolute deviation. So you can pick k as a, b or any integer in-between (if there is one).

For example, if the numbers are 1,3,4,6,9 - the median is 4 which you set k as. If the numbers are 4,7,12,15 the median is (7+12)/2 and you can set k as any number from 7 to 12. For example k=7 gives total deviation (7-4)+(7-7)+(12-7)+(15-7)=16 and k=12 gives total deviation (12-4)+(12-7)+(12-12)+(15-12) = 16.

related
  • 794
  • 5
  • 14
  • The answer given by the median will still not be optimum. The OP has said that K can be any integral number and it is not necessary for the number to be from the array. – Nikunj Banka May 04 '14 at 13:47
  • @invin: If you read the answer, you will see at no point it is assumed that the number needs to be from the array. However, it turns that the optimal number can be taken from the array. – related May 05 '14 at 14:12
-1

You can represent all the values in the array in the form of points lying on the number line.

For example the points lying on the number line can look like:

_____________________4___________7_______________________12___________15

So according to the problem definition, we can say that the point on the number line that is equidistant from all the points should give us the correct answer.

So what you have to do is find the average of all the numbers. Return the answer as the integer that is more closer to the avg. eg For 2.66 return 3, for 2.2 return 2, for 2.5 return any of 2 or 3.

Running Time: O(N) to calculate the average of the numbers.

Nikunj Banka
  • 11,117
  • 16
  • 74
  • 112
  • I don't think that's correct. Do you have some derivation to demonstrate that? If I remember correctly, the mean minimizes the sum of squared differences, and the median minimizes the sum of absolute differences. But if `k` is constrained to be an integer, that could change the problem. – Robert Dodier May 04 '14 at 05:24
  • You yourself say that the median minimizes the sum of absolute differences. In this question we want to minimize the sum of absolute differences and it is not necessary to pick an element from the list. So instead of taking the median, we can easily take the mean and expect that we will get the correct answer. – Nikunj Banka May 04 '14 at 06:10
  • @invin: Depending on the distribution of the numbers, there can be a very large difference between mean and median. The median is more robust against outliers, i.e., changing the value of outliers (without crossing other values) will markedly change the mean, but will have no influence on the median. – Lutz Lehmann May 04 '14 at 11:30
  • 1
    @invin: Say the numbers are A=[1,20,300,4000,50000]. Your method would pick k=10864, giving total abs. deviation 78271. This is suboptimal, since k=300 gives total abs. deviation 53979. This happens because the mean does not equal the median. – related May 04 '14 at 12:04