Find minimum cost to convert array to arithmetic progression

Question

I recently encountered this question in an interview. I couldn't really come up with an algorithm for this.

Given an array of unsorted integers, we have to find the minimum cost in which this array can be converted to an Arithmetic Progression where a cost of 1 unit is incurred if any element is changed in the array. Also, the value of the element ranges between (-inf,inf).

I sort of realised that DP can be used here, but I couldn't solve the equation. There were some constraints on the values, but I don't remember them. I am just looking for high level pseudo code.

Is swapping allowed or is it supposed to be element-by-element arithmetic manipulation only? — sunny, Jul 31 '15 at 16:29

Louis Ricci · Answer 1 · 2015-08-03T12:15:57.273

2

EDIT Here's a correct solution, unfortunately, while simple to understand it's not very efficient at O(n^3).

function costAP(arr) {
    if(arr.length < 3) { return 0; }
    var minCost = arr.length;
    for(var i = 0; i < arr.length - 1; i++) {
        for(var j = i + 1; j < arr.length; j++) {
            var delta = (arr[j] - arr[i]) / (j - i);
            var cost = 0;
            for(var k = 0; k < arr.length; k++) {
                if(k == i) { continue; }
                if((arr[k] + delta * (i - k)) != arr[i]) { cost++; }
            }
            if(cost < minCost) { minCost = cost; }
        }
    }
    return minCost;
}

Find the relative delta between every distinct pair of indices in the array
Use the relative delta to test the cost of transforming the whole array to AP using that delta
Return the minimum cost

edited Aug 03 '15 at 12:15

answered Jul 31 '15 at 18:03

Louis Ricci

20,804
5
48
62

Your insight about looking for an existing arithmetic progression is right, but I'm afraid nothing confines it to a single run, so adjacent differences aren't very informative. E.g., in [1, 42, 3, 69, 5], the 42 and the 69 can be changed for a cost of just 2. -1 for now. – j_random_hacker Aug 01 '15 at 19:15
@j_random_hacker - I've added an always correct solution – Louis Ricci Aug 03 '15 at 12:21

score 2 · Answer 2 · edited May 23 '17 at 10:26

Louis Ricci had the right basic idea of looking for the largest existing arithmetic progression, but assumed that it would have to appear in a single run, when in fact the elements of this progression can appear in any subset of the positions, e.g.:

1 42 3 69 5 1111 2222 8

requires just 4 changes:

  42   69   1111 2222
1    3    5           8

To calculate this, notice that every AP has a rightmost element. We can suppose each element i of the input vector to be the rightmost AP position in turn, and for each such i consider all positions j to the left of i, determining the step size implied for each (i, j) combination and, when this is integer (indicating a valid AP), add one to the the number of elements that imply this step size and end at position i -- since all such elements belong to the same AP. The overall maximum is then the longest AP:

struct solution {
    int len;
    int pos;
    int step;
};

solution longestArithProg(vector<int> const& v) {
    solution best = { -1, 0, 0 };

    for (int i = 1; i < v.size(); ++i) {
        unordered_map<int, int> bestForStep;
        for (int j = 0; j < i; ++j) {
            int step = (v[i] - v[j]) / (i - j);
            if (step * (i - j) == v[i] - v[j]) {
                // This j gives an integer step size: record that j lies on this AP
                int len = ++bestForStep[step];
                if (len > best.len) {
                    best.len = len;
                    best.pos = i;
                    best.step = step;
                }
            }
        }
    }

    ++best.len;     // We never counted the final element in the AP
    return best;
}

The above C++ code uses O(n^2) time and O(n) space, since it loops over every pair of positions i and j, performing a single hash read and write for each. To answer the original problem:

int howManyChangesNeeded(vector<int> const& v) {
    return v.size() - longestArithProg(v).len;
}

this is a much more efficiient answer – Louis Ricci Aug 03 '15 at 12:46 — Louis Ricci, Aug 03 '15 at 12:46

score 1 · Answer 3 · edited May 23 '17 at 11:51

This problem has a simple geometric interpretation, which shows that it can be solved in O(n^2) time and probably can't be solved any faster than that (reduction from 3SUM). Suppose our array is [1, 2, 10, 3, 5]. We can write that array as a sequence of points

(0,1), (1,2), (2,10), (3,3), (4,5)

in which the x-value is the index of the array item and the y-value is the value of the array item. The question now becomes one of finding a line which passes the maximum possible number of points in that set. The cost of converting the array is the number of points not on a line, which is minimized when the number of points on a line is maximized.

A fairly definitive answer to that question is given in this SO posting: What is the most efficient algorithm to find a straight line that goes through most points?

The idea: for each point P in the set from left to right, find the line passing through that point and a maximum number of points to the right of P. (We don't need to look at points to the left of P because they would have been caught in an earlier iteration).

To find the maximum number of P-collinear points to the right of P, for each such point Q calculate the slope of the line segment PQ. Tally up the different slopes in a hash map. The slope which maps to the maximum number of hits is what you're looking for.

Technical issue: you probably don't want to use floating point arithmetic to calculate the slopes. On the other hand, if you use rational numbers, you potentially have to calculate the greatest common divisor in order to compare fractions by comparing numerator and denominator, which multiplies running time by a factor of log n. Instead, you should check equality of rational numbers a/b and c/d by testing whether ad == bc.

The SO posting referenced above gives a reduction from 3SUM, i.e., this problem is 3SUM-hard which shows that if this problem could be solved substantially faster than O(n^2), then 3SUM could also be solved substantially faster than O(n^2). This is where the condition that the integers are in (-inf,inf) comes in. If it is known that the integers are from a bounded set, the reduction from 3SUM is not definitive.

An interesting further question is whether the idea in the Wikipedia for solving 3SUM in O(n + N log N) time when the integers are in the bounded set (-N,N) can be used to solve the minimum cost to convert an array to an AP problem in time faster than O(n^2).

You look to be describing much the same algorithm as I gave, except that (a) I examine points to the left (rather than right) of P, and (b) we can ignore lines with non-integral slope (assuming that elements have to remain integers) so that's what I do, instead of using FP or rational numbers. Also *your* reduction is currently the wrong way around -- it needs to convert *from* the linked straight-line-that-goes-through-max-points problem *to* the special case you give (namely, the same problem but with the x co-ords constrained to 1, 2, 3 , ...). — j_random_hacker, Aug 03 '15 at 17:42
@j_random_hacker, of course, you are right about rational slopes! I should have thought about that. I believe you are right as well about the direction of the reduction, though it's straightforward if we allow arrays with undefined elements (corresponding to missing points) the cost of changing which is 0, I believe. I'll think about it some more. — Edward Doolittle, Aug 03 '15 at 18:22

Tony Tuttle · Answer 4 · 2015-07-31T16:41:05.937

0

Given the array a = [a_1, a_2, ..., a_n] of unsorted integers, let diffs = [a_2-a_1, a_3-a_2, ..., a_n-a_(n-1)].

Find the maximum occurring value in diffs and adjust any values in a necessary so that all neighboring values differ by this amount.

edited Jul 31 '15 at 16:41

answered Jul 31 '15 at 16:35

Tony Tuttle

612
7
23

The part about "adjust any values in `a` necessary..." is kind of important. How do you identify how many values to adjust? I'm thinking of a case like [1, 2, 4, 6, 8, 19, 21, 23, 25, 0, 0, 0] where there are two or more subsequences that match the desired diff value. – Erick G. Hagstrom Jul 31 '15 at 18:27
Your algorithm needs a little more hashing out: a = [1,2,3,5,7,8,9,10], diffs = [1,1,2,2,1,1,1], maxRecurringDiff = 5 (5 occurances of 1). If you adjust values starting from the beginning then you end up with a cost of 5 but if you adjust values starting at the end your cost is 4 (change 1,2,3,4 => 3,4,5,6). – Louis Ricci Jul 31 '15 at 18:31
Both valid points that I overlooked. I see that the solution that @LouisRicci posted addresses these issues. – Tony Tuttle Jul 31 '15 at 19:25

score 0 · Answer 5 · answered Jul 31 '15 at 18:58

Interestingly,even I had the same question in my campus recruitment test today.While doing the test itself,I realised that this logic of altering elements based on most frequent differences between 2 subsequent elements in the array fails in some cases. Eg-4,5,8,9 .According to the logic of a2-a1,a3-a2 as proposed above,answer shud be 1 which is not the case. As you suggested DP,I feel it can be on the lines of considering 2 values for each element in array-cost when it is modified as well as when it is not modified and return minimum of the 2.Finally terminate when you reach end of the array.

Find minimum cost to convert array to arithmetic progression

5 Answers5

Linked