Longest arithmetic progression with a hole

Question

The longest arithmetic progression subsequence problem is as follows. Given an array of integers A, devise an algorithm to find the longest arithmetic progression in it. In other words find a sequence i1 < i2 < … < ik, such that A[i1], A[i2], …, A[ik] form an arithmetic progression, and k is maximal. The following code solves the problem in O(n^2) time and space. (Modified from http://www.geeksforgeeks.org/length-of-the-longest-arithmatic-progression-in-a-sorted-array/ . )

#!/usr/bin/env python
import sys

def arithmetic(arr):
    n = len(arr)
    if (n<=2):
        return n

    llap = 2


    L = [[0]*n for i in xrange(n)]
    for i in xrange(n):
        L[i][n-1] = 2

    for j in xrange(n-2,0,-1):
        i = j-1
        k = j+1
        while (i >=0 and k <= n-1):
            if (arr[i] + arr[k] < 2*arr[j]):
                k = k + 1
            elif (arr[i] + arr[k] > 2*arr[j]):
                L[i][j] = 2
                i -= 1
            else:

                L[i][j] = L[j][k] + 1                   
                llap = max(llap, L[i][j]) 
                i = i - 1
                k = j + 1

        while (i >=0):
            L[i][j] = 2
            i -= 1
    return llap

arr = [1,4,5,7,8,10]  
print arithmetic(arr)

This outputs 4.

However I would like to be able to find arithmetic progressions where up to one value is missing. So if arr = [1,4,5,8,10,13] I would like it to report that there is a progression of length 5 with one value missing.

Can this be done efficiently?

@Keyser There is a sequence 1, 4, 7, 10, 13 but the 7 is missing. — Simd, Aug 13 '13 at 19:40
Oh, I didn't realize you allowed gaps between the numbers :) — keyser, Aug 13 '13 at 19:41
What if you set up your table so that you don't just index by "last index" and "last element", but also whether or not the sequence already has a gap? — Dennis Meng, Aug 13 '13 at 21:02
Had to wiki "[arithmetic progression](http://en.wikipedia.org/wiki/Arithmetic_progression)..." — 2rs2ts, Aug 13 '13 at 23:47
@DennisMeng That was my first thought too, but there's going to be some trickiness when the gap is in between the first two elements. [1,5] could be the first two elements of a 4-list, or the first and third elements of a 2-list with one element missing already, and you only have one spot in your 2D array to store both pieces of information. — dfan, Aug 14 '13 at 01:43
See also http://stackoverflow.com/questions/18159911 for various different algorithms: one of them might be easily adaptable. — Armin Rigo, Aug 14 '13 at 09:04
@dfan That's why I was suggesting some flag. That way, `[1,5]` with the flag indicates a 2-list, while `[1,5]` without it is a 4-list. — Dennis Meng, Aug 14 '13 at 16:04
@DennisMeng Yes, but there is only room in L[i][j] (where arr[i] == 1 and arr[j] == 5) for one "sequence in progress". This is not a dealbreaker but it does mean that you have to store more information than just "3-long sequence with the flag" or "2-long sequence without flag", since both possibilities are still in play. — dfan, Aug 14 '13 at 16:44
@dfan I'm suggesting a `[][][]`, not a `[][]`, where `L[i][j][1]` would be the sequence in progress that already has a gap, and `L[i][j][0]` would be the sequence in progress without a gap. (I guess saying "flag" was misleading) — Dennis Meng, Aug 14 '13 at 16:46
@DennisMeng OK, then we agree entirely! I thought you were just adding a bool. In retrospect I see that in your first comment you did explicitly say you were indexing by "gapness". — dfan, Aug 14 '13 at 16:55

score 1 · Accepted Answer · edited May 23 '17 at 12:11

Adapted from my answer to Longest equally-spaced subsequence. n is the length of A, and d is the range, i.e. the largest item minus the smallest item.

A = [1, 4, 5, 8, 10, 13]    # in sorted order
Aset = set(A)

for d in range(1, 13):
    already_seen = set()
    for a in A:
        if a not in already_seen:
            b = a
            count = 1
            while b + d in Aset:
                b += d
                count += 1
                already_seen.add(b)
            # if there is a hole to jump over:
            if b + 2 * d in Aset:
                b += 2 * d
                count += 1
                while b + d in Aset:
                    b += d
                    count += 1
                    # don't record in already_seen here
            print "found %d items in %d .. %d" % (count, a, b)
            # collect here the largest 'count'

I believe that this solution is still O(n*d), simply with larger constants than looking without a hole, despite the two "while" loops inside the two nested "for" loops. Indeed, fix a value of d: then we are in the "a" loop that runs n times; but each of the inner two while loops run at most n times in total over all values of a, giving a complexity O(n+n+n) = O(n) again.

Like the original, this solution is adaptable to the case where you're not interested in the absolute best answer but only in subsequences with a relatively small step d: e.g. n might be 1'000'000, but you're only interested in subsequences of step at most 1'000. Then you can make the outer loop stop at 1'000.

Longest arithmetic progression with a hole

1 Answers1