3

The longest arithmetic progression subsequence problem is as follows. Given an array of integers A, devise an algorithm to find the longest arithmetic progression in it. In other words find a sequence i1 < i2 < … < ik, such that A[i1], A[i2], …, A[ik] form an arithmetic progression, and k is maximal. The following code solves the problem in O(n^2) time and space. (Modified from http://www.geeksforgeeks.org/length-of-the-longest-arithmatic-progression-in-a-sorted-array/ . )

#!/usr/bin/env python
import sys

def arithmetic(arr):
    n = len(arr)
    if (n<=2):
        return n

    llap = 2


    L = [[0]*n for i in xrange(n)]
    for i in xrange(n):
        L[i][n-1] = 2

    for j in xrange(n-2,0,-1):
        i = j-1
        k = j+1
        while (i >=0 and k <= n-1):
            if (arr[i] + arr[k] < 2*arr[j]):
                k = k + 1
            elif (arr[i] + arr[k] > 2*arr[j]):
                L[i][j] = 2
                i -= 1
            else:

                L[i][j] = L[j][k] + 1                   
                llap = max(llap, L[i][j]) 
                i = i - 1
                k = j + 1

        while (i >=0):
            L[i][j] = 2
            i -= 1
    return llap

arr = [1,4,5,7,8,10]  
print arithmetic(arr)

This outputs 4.

However I would like to be able to find arithmetic progressions where up to one value is missing. So if arr = [1,4,5,8,10,13] I would like it to report that there is a progression of length 5 with one value missing.

Can this be done efficiently?

Simd
  • 19,447
  • 42
  • 136
  • 271
  • @Keyser There is a sequence 1, 4, 7, 10, 13 but the 7 is missing. – Simd Aug 13 '13 at 19:40
  • Oh, I didn't realize you allowed gaps between the numbers :) – keyser Aug 13 '13 at 19:41
  • 1
    What if you set up your table so that you don't just index by "last index" and "last element", but also whether or not the sequence already has a gap? – Dennis Meng Aug 13 '13 at 21:02
  • Had to wiki "[arithmetic progression](http://en.wikipedia.org/wiki/Arithmetic_progression)..." – 2rs2ts Aug 13 '13 at 23:47
  • @DennisMeng That was my first thought too, but there's going to be some trickiness when the gap is in between the first two elements. [1,5] could be the first two elements of a 4-list, or the first and third elements of a 2-list with one element missing already, and you only have one spot in your 2D array to store both pieces of information. – dfan Aug 14 '13 at 01:43
  • See also http://stackoverflow.com/questions/18159911 for various different algorithms: one of them might be easily adaptable. – Armin Rigo Aug 14 '13 at 09:04
  • @dfan That's why I was suggesting some flag. That way, `[1,5]` with the flag indicates a 2-list, while `[1,5]` without it is a 4-list. – Dennis Meng Aug 14 '13 at 16:04
  • @DennisMeng Yes, but there is only room in L[i][j] (where arr[i] == 1 and arr[j] == 5) for one "sequence in progress". This is not a dealbreaker but it does mean that you have to store more information than just "3-long sequence with the flag" or "2-long sequence without flag", since both possibilities are still in play. – dfan Aug 14 '13 at 16:44
  • @dfan I'm suggesting a `[][][]`, not a `[][]`, where `L[i][j][1]` would be the sequence in progress that already has a gap, and `L[i][j][0]` would be the sequence in progress without a gap. (I guess saying "flag" was misleading) – Dennis Meng Aug 14 '13 at 16:46
  • @DennisMeng OK, then we agree entirely! I thought you were just adding a bool. In retrospect I see that in your first comment you did explicitly say you were indexing by "gapness". – dfan Aug 14 '13 at 16:55

1 Answers1

1

Adapted from my answer to Longest equally-spaced subsequence. n is the length of A, and d is the range, i.e. the largest item minus the smallest item.

A = [1, 4, 5, 8, 10, 13]    # in sorted order
Aset = set(A)

for d in range(1, 13):
    already_seen = set()
    for a in A:
        if a not in already_seen:
            b = a
            count = 1
            while b + d in Aset:
                b += d
                count += 1
                already_seen.add(b)
            # if there is a hole to jump over:
            if b + 2 * d in Aset:
                b += 2 * d
                count += 1
                while b + d in Aset:
                    b += d
                    count += 1
                    # don't record in already_seen here
            print "found %d items in %d .. %d" % (count, a, b)
            # collect here the largest 'count'

I believe that this solution is still O(n*d), simply with larger constants than looking without a hole, despite the two "while" loops inside the two nested "for" loops. Indeed, fix a value of d: then we are in the "a" loop that runs n times; but each of the inner two while loops run at most n times in total over all values of a, giving a complexity O(n+n+n) = O(n) again.

Like the original, this solution is adaptable to the case where you're not interested in the absolute best answer but only in subsequences with a relatively small step d: e.g. n might be 1'000'000, but you're only interested in subsequences of step at most 1'000. Then you can make the outer loop stop at 1'000.

Community
  • 1
  • 1
Armin Rigo
  • 12,048
  • 37
  • 48