You can do it in linear time, assuming your data is already sorted.
You can view your data on an axis between min_value
and max_value
. Trivially, clever solutions do not use non-contiguous groups, so any clever solution can be represented as a group of K
segments on your axis, each segment [x1, x2]
representing all the numbers in your data between x1
and x2
.
The total cost of your solution is the sum of all the segment lengths, which is also max_value - min_value - (space between all your segments)
. The space between your segments is itself made of K - 1
"empty segments", i.e. with none of your input numbers in them, i.e. segments between two consecutive input numbers. What you want is to maximize the sum of the lengths of K - 1
such segments.
So all you have to do is (simple version):
- if necessary, sort your inputs
- compute B[i] = A[i+1] - A[i] for all i (A being the sorted input array)
- retrieve the K-1 biggest values of B (still linear time, can actually be done at the same time as previous step)
- These are the boundaries of your groups, so you can just iterate through A again to make the groups themselves (now that you have the boundaries) (this step can also be done together with the previous ones)
Necessarily all your groups will be non-empty if your number of different values is greater than K. If not, you can easily split groups containing duplicates of the same value to make all your groups non-empty.
Complexity (if already sorted): O(n*k) (at most) which is O(n) if K is constant. If not, just improve the search for the best K-1 segments to get O(n log(n)) at most
As is written, additional memory complexity can easily be O(K)