How to find the subarray that has sum closest to zero or a certain value t in O(nlogn)

Question

Actually it is the problem #10 of chapter 8 of Programming Pearls 2nd edition. It asked two questions: given an array A[] of integers(positive and nonpositive), how can you find a continuous subarray of A[] whose sum is closest to 0? Or closest to a certain value t?

I can think of a way to solve the problem closest to 0. Calculate the prefix sum array S[], where S[i] = A[0]+A[1]+...+A[i]. And then sort this S according to the element value, along with its original index information kept, to find subarray sum closest to 0, just iterate the S array and do the diff of the two neighboring values and update the minimum absolute diff.

Question is, what is the best way so solve second problem? Closest to a certain value t? Can anyone give a code or at least an algorithm? (If anyone has better solution to closest to zero problem, answers are welcome too)

I have a sorted array with entries colored red and black. How do I find the closest red-black pair? How does that solve your problem? — David Eisenstat, May 05 '13 at 23:14
Does “subarray” in this context denote consecutive array elements or can you leave holes? — MvG, May 06 '13 at 08:49
@MvG: I don't have my copy of Bentley handy, but I'm pretty sure he means contiguous elements. — Fred Foo, May 06 '13 at 09:09
@DavidEisenstat I don't get the hint... the sorted array doesn't contain just 2 distinct values, so how does that help? — Henley, Jul 26 '13 at 00:27

score 7 · Answer 1 · answered Jul 29 '13 at 10:14

To solve this problem, you can build an interval-tree by your own, or balanced binary search tree, or even beneficial from STL map, in O(nlogn).

Following is use STL map, with lower_bound().

#include <map>
#include <iostream>
#include <algorithm>
using namespace std;

int A[] = {10,20,30,30,20,10,10,20};

// return (i, j) s.t. A[i] + ... + A[j] is nearest to value c
pair<int, int> nearest_to_c(int c, int n, int A[]) {
    map<int, int> bst;
    bst[0] = -1;
    // barriers
    bst[-int(1e9)] = -2;
    bst[int(1e9)] = n;

    int sum = 0, start, end, ret = c;
    for (int i=0; i<n; ++i) {
            sum += A[i];
            // it->first >= sum-c, and with the minimal value in bst
            map<int, int>::iterator it = bst.lower_bound(sum - c);
            int tmp = -(sum - c - it->first);
            if (tmp < ret) {
                    ret = tmp;
                    start = it->second + 1;
                    end = i;
            }

            --it;
            // it->first < sum-c, and with the maximal value in bst
            tmp = sum - c - it->first;
            if (tmp < ret) {
                    ret = tmp;
                    start = it->second + 1;
                    end = i;
            }

            bst[sum] = i;
    }
    return make_pair(start, end);
}

// demo
int main() {
    int c;
    cin >> c;
    pair<int, int> ans = nearest_to_c(c, 8, A);

    cout << ans.first << ' ' << ans.second << endl;
    return 0;
}

This is the correct solution IMHO. It needs more upvote. Basically it is going over the array, keeping a sorted history of prefix sums, and for current `sum`, finding the best candidate in the history closest to `sum - t`. It is O(NlogN) and works in one pass. — OnurC, Nov 11 '13 at 04:58
Why don't we also consider candidates closest to `(sum + c)`? — Konstantin Milyutin, Nov 13 '16 at 12:30

score 4 · Answer 2 · edited May 23 '17 at 11:53

You can adapt your method. Assuming you have an array S of prefix sums, as you wrote, and already sorted in increasing order of sum value. The key concept is to not only examine consecutive prefix sums, but instead use two pointers to indicate two positions in the array S. Written in a (slightly pythonic) pseudocode:

left = 0                 # Initialize window of length 0 ...
right = 0                # ... at the beginning of the array
best = ∞                 # Keep track of best solution so far
while right < length(S): # Iterate until window reaches the end of the array
  diff = S[right] - S[left]
  if diff < t:           # Window is getting too small
    if t - diff < best:  # We have a new best subarray
      best = t - diff
      # remember left and right as well
    right = right + 1    # Make window bigger
  else:                  # Window getting too big
    if diff - t < best   # We have a new best subarray
      best = diff - t
      # remember left and right as well
    left = left + 1      # Make window smaller

The complexity is bound by the sorting. The above search will take at most 2n=O(n) iterations of the loop, each with computation time bound by a constant. Note that the above code was conceived for positive t.

The code was conceived for positive elements in S, and positive t. If any negative integers crop up, you might end up with a situation where the original index of right is smaller than that of left. So you'd end up with a sub sequence sum of -t. You can check this condition in the if … < best checks, but if you only suppress such cases there, I believe that you might be missing some relevant cases. Bottom line is: take this idea, think it through, but you'll have to adapt it for negative numbers.

Note: I think that this is the same general idea which Boris Strandjev wanted to express in his solution. However, I found that solution somewhat hard to read and harder to understand, so I'm offering my own formulation of this.

I think this is incorrect: First, as you have mentioned it does not handle -ve values. And for all +ve values, you do not need to pre-calculate and sort the prefix sums. The positive values sub-problem can be solved with your algorithm, modified to keep a running sum between `left` and `right` and comparing it to `t`. — OnurC, Nov 11 '13 at 04:52
@OnurC: True is that for positive array elements, an approach without sorted prefix sums would work as well. I believe that my approach might be easier to extend in such a way that it will handle negative values as well. But this is more of a gut feeling, I haven't thought this through yet. In any case, while my code might be unneccessary for the positive case, I don't see it as incorrect. Do you? If so, can you provide an example where it breaks? — MvG, Nov 11 '13 at 08:01

Boris Strandjev · Answer 3 · 2013-05-06T09:12:25.093

2

Your solution for the 0 case seems ok to me. Here is my solution to the second case:

You again calculate the prefix sums and sort.
You initialize to indices start to 0 (first index in the sorted prefix array) end to last (last index of the prefix array)
you start iterating over start 0...last and for each you find the corresponding end - the last index in which the prefix sum is such that prefix[start] + prefix[end] > t. When you find that end the best solution for start is either prefix[start] + prefix[end] or prefix[start] + prefix[end - 1] (the latter taken only if end > 0)
the most important thing is that you do not search for end for each start from scratch - prefix[start] increases in value when iterating over all possible values for start, which means that in each iteration you are interested only in values <= the previous value of end.
you can stop iterating when start > end
you take the best of all values obtained for all start positions.

It can easily be proved that this will give you complexity of O(n logn) for the entire algorithm.

edited May 06 '13 at 09:12

answered May 06 '13 at 09:07

Boris Strandjev

46,145
15
108
135

1

Since the overall complexity is `O(n*log(n))` anyway, you could also use binary search to find `end` for a specific value of `start`. The linear algorithm is probably simpler to code, though :) – Niklas B. May 06 '13 at 10:04
Can you please explain this part: "When you find that end the best solution for start is either prefix[start] + prefix[end] or prefix[start] + prefix[end - 1]" Say the sorted prefix sums are 1, 2, 50, 100, 1000, 10000, 100000 and t is 2. We start at prefix[0] + prefix[6], which is 1 + 1000000 = 100001. The best solution, you're telling me is this, or 1 + 10000? Isn't the best solution 1 + 2 in actuality? – Henley Jul 26 '13 at 00:37
OK, I understand the above EXCEPT I don't think it actually works if the original array has negative #'s. I also think your solution fails if t != 0 because you have to take into account where the 2 prefix sums end in the original array. Because if t= 100, then 200-100 is indeed 100, but 100-200 is very far from 100. It doesn't matter if t=0 because +n and -n are equal distance from 0. – Henley Jul 26 '13 at 01:43
As a concrete example, say the original array is : 75, 25, -75, -25, 1. The prefix sum of first 2 elements is 100, prefix sum of all elements is 1. Suppose t = 100.1, and you choose 1, and 100 as the best prefix sum pair. 1 - 100 = -99, which is nowhere close to 100 as the other candidates. – Henley Jul 26 '13 at 01:58
My solution would be similar to yours with some adjustments. So I'd keep a HashMap mapping each of the sorted prefix sums to the index of the range it represents. Then when comparing 2 prefix sums, you look at the indices first. So you do PrefixSum[i] - PrefixSum[j] where i's prefix sum covers the bigger range than j's. – Henley Jul 26 '13 at 02:01

score 1 · Answer 4 · edited Dec 02 '15 at 02:21

I found this question by accident. Although it's been a while, I just post it. O(nlogn) time, O(n) space algorithm. This is running Java code. Hope this help people.

import java.util.*;

public class FindSubarrayClosestToZero {

    void findSubarrayClosestToZero(int[] A) {
        int curSum = 0;
        List<Pair> list = new ArrayList<Pair>();

        // 1. create prefix array: curSum array
        for(int i = 0; i < A.length; i++) {
            curSum += A[i];
            Pair pair = new Pair(curSum, i);
            list.add(pair);
        }

        // 2. sort the prefix array by value
        Collections.sort(list, valueComparator);

        // printPairList(list);
        System.out.println();


        // 3. compute pair-wise value diff: Triple< diff, i, i+1>
        List<Triple> tList = new ArrayList<Triple>();
        for(int i=0; i < A.length-1; i++) {
            Pair p1 = list.get(i);
            Pair p2 = list.get(i+1);
            int valueDiff = p2.value - p1.value;

            Triple Triple = new Triple(valueDiff, p1.index, p2.index);          
            tList.add(Triple);
        }       

        // printTripleList(tList);
        System.out.println();

        // 4. Sort by min diff
        Collections.sort(tList, valueDiffComparator);
        // printTripleList(tList);

        Triple res = tList.get(0);

        int startIndex = Math.min(res.index1 + 1, res.index2);
        int endIndex = Math.max(res.index1 + 1, res.index2);

        System.out.println("\n\nThe subarray whose sum is closest to 0 is: ");
        for(int i= startIndex; i<=endIndex; i++) {
            System.out.print(" " + A[i]);
        }
    }

    class Pair {
        int value;
        int index;

        public Pair(int value, int index) {
            this.value = value;
            this.index = index;
        }
    }

    class Triple {
        int valueDiff;
        int index1;
        int index2;

        public Triple(int valueDiff, int index1, int index2) {
            this.valueDiff = valueDiff;
            this.index1 = index1;
            this.index2 = index2;
        }
    }

    public static Comparator<Pair> valueComparator = new Comparator<Pair>() {
        public int compare(Pair p1, Pair p2) {
            return p1.value - p2.value;
        }
    };      

    public static Comparator<Triple> valueDiffComparator = new Comparator<Triple>() {
        public int compare(Triple t1, Triple t2) {
            return t1.valueDiff - t2.valueDiff;
        }
    };

    void printPairList(List<Pair> list) {
        for(Pair pair : list) {
            System.out.println("<" + pair.value + " : " + pair.index + ">");
        }
    }

    void printTripleList(List<Triple> list) {
        for(Triple t : list) {
            System.out.println("<" + t.valueDiff + " : " + t.index1 + " , " + t.index2 + ">");
        }
    }


    public static void main(String[] args) {
        int A1[] = {8, -3, 2, 1, -4, 10, -5};       // -3, 2, 1
        int A2[] = {-3, 2, 4, -6, -8, 10, 11};      // 2, 4, 6
        int A3[] = {10, -2, -7};                                // 10, -2, -7

        FindSubarrayClosestToZero f = new FindSubarrayClosestToZero();
        f.findSubarrayClosestToZero(A1);
        f.findSubarrayClosestToZero(A2);
        f.findSubarrayClosestToZero(A3);
    }
}

usacomath · Answer 5 · 2016-05-17T23:19:32.583

Solution time complexity : O(NlogN)
Solution space complexity : O(N)

[Note this problem can't be solved in O(N) as some have claimed]

Algorithm:-

Compute cumulative array(here,cum[]) of given array [Line 10]
Sort the cumulative array [Line 11]
Answer is minimum amongst C[i]-C[i+1] , $\forall$ i∈[1,n-1] (1-based index) [Line 12]

C++ Code:-

#include<bits/stdc++.h>
#define M 1000010
#define REP(i,n) for (int i=1;i<=n;i++) 
using namespace std;
typedef long long ll;
ll a[M],n,cum[M],ans=numeric_limits<ll>::max(); //cum->cumulative array
int main() {
    ios::sync_with_stdio(false);cin.tie(0);cout.tie(0);
    cin>>n; REP(i,n) cin>>a[i],cum[i]=cum[i-1]+a[i];
    sort(cum+1,cum+n+1);
    REP(i,n-1) ans=min(ans,cum[i+1]-cum[i]);
    cout<<ans; //min +ve difference from 0 we can get
}

Jingguo Yao · Answer 6 · 2013-12-24T12:45:01.503

After more thinking on this problem, I found that @frankyym's solution is the right solution. I have made some refinements on the original solution, here is my code:

#include <map>
#include <stdio.h>
#include <algorithm>
#include <limits.h>

using namespace std;

#define IDX_LOW_BOUND -2

// Return [i..j] range of A
pair<int, int> nearest_to_c(int A[], int n, int t) {
  map<int, int> bst;
  int presum, subsum, closest, i, j, start, end;
  bool unset;
  map<int, int>::iterator it;

  bst[0] = -1;
  // Barriers. Assume that no prefix sum is equal to INT_MAX or INT_MIN.
  bst[INT_MIN] = IDX_LOW_BOUND;
  bst[INT_MAX] = n;
  unset = true;
  // This initial value is always overwritten afterwards.
  closest = 0; 
  presum = 0;
  for (i = 0; i < n; ++i) {
    presum += A[i];
    for (it = bst.lower_bound(presum - t), j = 0; j < 2; --it, j++) {
      if (it->first == INT_MAX || it->first == INT_MIN) 
        continue;
      subsum = presum - it->first;
      if (unset || abs(closest - t) > abs(subsum - t)) {
        closest = subsum;
        start = it->second + 1;
        end = i;
        if (closest - t == 0)
          goto ret;
        unset = false;
      }
    }
    bst[presum] = i;
  }
ret:
  return make_pair(start, end);
}

int main() {
  int A[] = {10, 20, 30, 30, 20, 10, 10, 20};
  int t;
  scanf("%d", &t);
  pair<int, int> ans = nearest_to_c(A, 8, t);
  printf("[%d:%d]\n", ans.first, ans.second);
  return 0;
}

score 0 · Answer 7 · answered Dec 26 '14 at 23:49

As a side note: I agree with the algorithms provided by other threads here. There is another algorithm on top of my head recently.

Make up another copy of A[], which is B[]. Inside B[], each element is A[i]-t/n, which means B[0]=A[0]-t/n, B[1]=A[1]-t/n ... B[n-1]=A[n-1]-t/n. Then the second problem is actually transformed to the first problem, once the smallest subarray of B[] closest to 0 is found, the subarray of A[] closest to t is found at the same time. (It is kinda tricky if t is not divisible by n, nevertheless, the precision has to be chosen appropriate. Also the runtime is O(n))

score 0 · Answer 8 · answered May 28 '15 at 06:03

I think there is a little bug concerning the closest to 0 solution. At the last step we should not only inspect the difference between neighbor elements but also elements not near to each other if one of them is bigger than 0 and the other one is smaller than 0.

Sorry, I thought I am supposed to get all answers for the problem. Didn't see it only requires one.

score 0 · Answer 9 · answered Jul 17 '17 at 13:33

Cant we use dynamic programming to solve this question similar to kadane's algorithm.Here is my solution to this problem.Please comment if this approach is wrong.

#include <bits/stdc++.h>
using namespace std;
int main() {
 //code
 int test;
 cin>>test;
 while(test--){
     int n;
     cin>>n;
     vector<int> A(n);
     for(int i=0;i<n;i++)
         cin>>A[i];
    int closest_so_far=A[0];
    int closest_end_here=A[0];
    int start=0;
    int end=0;
    int lstart=0;
    int lend=0;
    for(int i=1;i<n;i++){
        if(abs(A[i]-0)<abs(A[i]+closest_end_here-0)){
             closest_end_here=A[i]-0;
             lstart=i;
             lend=i;
        }
        else{
             closest_end_here=A[i]+closest_end_here-0;
             lend=i;
        }
        if(abs(closest_end_here-0)<abs(closest_so_far-0)){
             closest_so_far=closest_end_here;
             start=lstart;
             end=lend;
        }
    }
    for(int i=start;i<=end;i++)
         cout<<A[i]<<" ";
         cout<<endl;
    cout<<closest_so_far<<endl;
    
 }
 return 0;
}

score -1 · Answer 10 · edited Jun 26 '15 at 15:18

Here is a code implementation by java:

public class Solution {
    /**
     * @param nums: A list of integers
     * @return: A list of integers includes the index of the first number 
     *          and the index of the last number
     */
    public ArrayList<Integer> subarraySumClosest(int[] nums) {
        // write your code here
        int len = nums.length;
        ArrayList<Integer> result = new ArrayList<Integer>();
        int[] sum = new int[len];
        HashMap<Integer,Integer> mapHelper = new HashMap<Integer,Integer>();
        int min = Integer.MAX_VALUE;
        int curr1 = 0;
        int curr2 = 0;
        sum[0] = nums[0];
        if(nums == null || len < 2){
            result.add(0);
            result.add(0);
            return result;
        }
        for(int i = 1;i < len;i++){
            sum[i] = sum[i-1] + nums[i];
        }
        for(int i = 0;i < len;i++){
            if(mapHelper.containsKey(sum[i])){
                result.add(mapHelper.get(sum[i])+1);
                result.add(i);
                return result;
            }
            else{
                mapHelper.put(sum[i],i);
            }
        }
        Arrays.sort(sum);
        for(int i = 0;i < len-1;i++){
            if(Math.abs(sum[i] - sum[i+1]) < min){
                min = Math.abs(sum[i] - sum[i+1]);
                curr1 = sum[i];
                curr2 = sum[i+1];
            }
        }
        if(mapHelper.get(curr1) < mapHelper.get(curr2)){
            result.add(mapHelper.get(curr1)+1);
            result.add(mapHelper.get(curr2));
        }
        else{
            result.add(mapHelper.get(curr2)+1);
            result.add(mapHelper.get(curr1)); 
        }
        return result;
    }
}

How to find the subarray that has sum closest to zero or a certain value t in O(nlogn)

10 Answers10