Optimal Algorithm for finding peak element in an array

Question

So far I haven't found any algorithm that solves this task: "An element is considered as a peak if and only if (A[i]>A[i+1])&&(A[i]>A[i-1]), not taking into account edges of the array(1D)."

I know that the common approach for this problem is using "Divide & Conquer" but that's in case of taking into consideration the edges as "peaks".

The O(..) complexity I need to get for this exercise is O(log(n)).

By the image above it is clear to me why it is O(log(n)), but without the edges complexity changes to O(n), because in the lower picture I run recursive function on each side of the middle element, which makes it run in O(n) (worst case scenario in which the element is near the edge). In this case, why not to use a simple binary search like this:

public static int GetPeak(int[]A)
    {

        if(A.length<=2)//doesn't apply for peak definition
        {
            return -1;
        }
        else {

        int Element=Integer.MAX_VALUE;//The element which is determined as peak

        // First and Second elements can't be hills
        for(int i=1;i<A.length-1;i++)
        {
            if(A[i]>A[i+1]&&A[i]>A[i-1])
            {
                Element=A[i];
            break;
            }
            else
            {
                Element=-1;
            }

        }
        return Element;
        }

The common algorithm is written here: http://courses.csail.mit.edu/6.006/spring11/lectures/lec02.pdf, but as I said before it doesn't apply for the terms of this exercise.

Return only one peak, else return -1.

Also, my apologies if the post is worded incorrectly due to the language barrier (I am not a native English speaker).

just iterate over for loop and for each i check the above condition, that element to it's left and right are both less that it or not — zenwraight, Aug 29 '18 at 14:55
@zenwraight I understand but using this approach gets to me an O(n) and not to O(log(n)) as it should be... is there a mistake in this exersice that say it needs to be O(log(n)) — ProgC, Aug 29 '18 at 14:57
What do you know about the data in the array? Can there be more than one peak? — Alex M, Aug 29 '18 at 14:57
@AlexM I will rewrite the post. I forgot to say to return only one peak, not all the peaks — ProgC, Aug 29 '18 at 15:00
Why do you have so many sentence fragments in separate paragraphs? — melpomene, Aug 29 '18 at 15:04
@melpomene if you examine the post closely you may notice why(hint: read the last sentence) — ProgC, Aug 29 '18 at 15:09
If you cannot make any hypothesis on the array data (more precisely: the monotonicity of the number sequence), then you **have** to scan the whole array until you find a local peak (that's O(n)). If you're looking for a O(log(n)) algorithm, then there must be more details that you haven't specified — Alex M, Aug 29 '18 at 15:25
@ProgC It has nothing to do with wording or English language proficiency. Just don't hit enter twice every couple of words. — melpomene, Aug 29 '18 at 15:29
@AlexM All the details are there. The arrangement of the elements is unknown — ProgC, Aug 29 '18 at 15:30
@melpomene In my opinion, the concpetion of the post doesn't diminish after hitting enter — ProgC, Aug 29 '18 at 15:32
With your definition of peak I don't see how you can do better than O(n) no matter what you do with the edges. The worst case seems to be a plateau with a single peak hidden somewhere: `[1, 1, 1, 1, 5, 1]`, `[1, 5, 1, 1, 1, 1]`, `[1, 1, 5, 1, 1, 1]` or maybe no peak at all: `[1, 1, 1, 1, 1, 1]`. You can't apply divide and conquer because no matter in which order you examine the elements, the single peak could hide anywhere, so you need to check all of them. — melpomene, Aug 29 '18 at 15:33
"it is clear to me why it is O(log(n))" : could you clarify why? — Alex M, Aug 29 '18 at 15:43
@AlexM "The most common attributes of logarithmic running-time function are that: -the choice of the next element on which to perform some action is one of several possibilities, and -only one will need to be chosen." https://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly in the worst case scenario I need to check all elements — ProgC, Aug 29 '18 at 15:45
@AlexM Maybe he means O(log(n)) in the average case. Given his implementation, it's possible his algorithm could finish on the first iteration of the loop. So, best case is O(1) and worst-case is O(n). Given random data, it could be that O(log(n)) is the average case — Woody1193, Aug 29 '18 at 15:53
If you know nothing about the possible values in your array, once you split it, how can you know whether you need to search the left or right sub-array? You'd need to search both, and you need to also check that you did not split right at the peak. In the general case (random data), scanning from left to right is the best approach. — Alex M, Aug 29 '18 at 15:54
@Woody1193 so, given the information. Is there not a strict answer? — ProgC, Aug 29 '18 at 15:55
The answer is actually very strict: if edges do not count as peaks, and if the data is random, then your average complexity is O(n), with best-case O(1) and worst-case O(n). — Alex M, Aug 29 '18 at 16:31
@AlexM I would agree, except that he has the `break` in there. If dynamic programming is employed, then even if he had to go into both left and right branches on his recursion, they may return immediately before any real work is done, especially if he is running sequentially. I don't have a mathematical proof as I imagine such would rely on statistics that I am not proficient with, but I still think you could make the case for "better than O(n)" at least. — Woody1193, Aug 29 '18 at 18:08
@AlexM Again, this relies on assumptions about the data, similar to those that allow for O(n) sorting, so an examination of the algorithm isn't sufficient — Woody1193, Aug 29 '18 at 18:11

score 0 · Answer 1 · answered Aug 29 '18 at 19:16

I think what you're looking for is a dynamic programming approach, utilizing divide-and-conquer. Essentially, you would have a default value for your peak which you would overwrite when you found one. If you could check at the beginning of your method and only run operations if you hadn't found a peak, then your O() notation would look something like O(pn) where p is the probability that any given element of your array is a peak, which is a variable term as it relates to how your data is structured (or not). For instance, if your array only has values between 1 and 5 and they're distributed equally then the probability would be equal to 0.24 so you would expect the algorithm to run in O(0.24n). Note that this still appears to be equivalent to O(n). However, if you require that your data values are unique on the array then your probability is equal to:

p = 2 * sum( [ choose(x - 1, 2) for x in 3:n ] ) / choose(n, 3)
p = 2 * sum( [ ((x - 1)! / (2 * (x - 3)!)) for x in 3:n ] ) / (n! / (n - 3)!)
p = sum( [ (x - 1) * (x - 2) for x in 3:n ] ) / (n * (n - 1) * (n - 2))
p = ((n * (n + 1) * (2 * n + 1)) / 6 - (n * (n + 1)) + 2 * n - 8) / (n * (n - 1) * (n - 2))
p = ((1 / 3) * n^3 - 5.5 * n^2 + 6.5 * n - 8) / (n * (n - 1) * (n - 2))

So, this seems like a lot but if we take the limit as n approaches infinity then we wind up with a value for p that is near 1/3.

So, if we have a 33% chance of finding a peak at any element on the array, then at the bottom level of your recursion when you have a 1/3 probability of finding a peak. So, the expected value of this is around 3 comparisons before you find one, which means a constant time. However, you still have to get to the bottom level of your recursion before you can do the comparisons and that requires O(log(n)) time. So, a divide-and-conquer approach should run in O(log(n)) time in the average case with O(n log(n)) in the worst case.

score 0 · Answer 2 · answered Aug 30 '18 at 08:29

If you cannot make any assumptions about your data (monotonicity of the number sequence, number of peaks), and if edges cannot count as peaks, then you cannot hope for a better average performance than O(n). Your data is randomly distributed, and any value can be a peak. You have to examine them one by one, and there is no correlation between the values.

Accepting edges as potential candidates for peaks changes everything: you know there will always be at least one peak, and a good enough strategy is to always search in the direction of increasing values until you start to go down or you reach an edge (this is the one of the document you provided). That strategy is O(nlog(n)) because you use binary search to look for a local max.

Optimal Algorithm for finding peak element in an array

2 Answers2