I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.
First of all we sort the numbers:
[1561,1572,1572,1609,1682,1731,1731,2041]
Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:
[(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]
So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.
I then sorted this list, so it tells me which pairs give me the smallest difference:
[(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]
What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.
From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:
- If n is 0, we're done.
- if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
- if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.
This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.
Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.
So, now we get to look at the special cases:
- ** as above **
- ** as above **
- If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
- If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.
4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?
- We'll want to take whichever branch produces the lowest sum.
- ...but only if it will use up the right number of indices.
So step 4 becomes something like this (pseudocode):
x = numberOfIndicesProvidedBy(currentDifference)
branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
branchB = findSmallestDifference (n , remainingDifferences) // recurse looking for **n**
sumA = currentDifference + sumOf(branchA)
sumB = sumOf(branchB)
validA = indicesAddedBy(branchA) == n
validB = indicesAddedBy(branchB) == n
if not validA && not validB then return an empty branch
if validA && not validB then return branchA
if validB && not validA then return branchB
// Here, both must be valid.
if sumA <= sumB then return branchA else return branchB
I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:
findSmallestDifference = findSmallestDifference' Set.empty
findSmallestDifference' _ _ [] = []
findSmallestDifference' taken n (d:ds)
| n == 0 = [] -- Case 1
| n == 1 && provides1 d = [d] -- Case 2
| n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
| provides0 d = findSmallestDifference' taken n ds -- Case 3a (See Edit)
| validA && not validB = branchA -- Case 4
| validB && not validA = branchB -- Case 4
| validA && validB && sumA <= sumB = branchA -- Case 4
| validA && validB && sumB <= sumA = branchB -- Case 4
| otherwise = [] -- Case 4
where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
branchB = findSmallestDifference' taken n ds
sumA = sumDifferences branchA
sumB = sumDifferences branchB
validA = n == (indicesTaken branchA)
validB = n == (indicesTaken branchA)
newTaken x = insertIndices x taken
Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:
*Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
Smallest Difference found is 48
1572 - 1572 = 0
1731 - 1731 = 0
1572 - 1561 = 11
1609 - 1572 = 37
*Main> findLeastDiff 4 [1515, 1520, 1500,1535]
Smallest Difference found is 30
1515 - 1500 = 15
1535 - 1520 = 15
This has become long, but I've tried to be explicit. Hopefully it was worth while.
Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.