Most effective Algorithm to find maximum of double-precision values

Question

What is the most effective way of finding a maximum value in a set of variables?

I have seen solutions, such as

private double findMax(double... vals) {
double max = Double.NEGATIVE_INFINITY;

for (double d : vals) {
   if (d > max) max = d;
}
    return max;
}

But, what would be the most effective algorithm for doing this?

hmm seems a good question. As I think you'll always have to iterate through all values in the set, I'd reduce it to "find the fasestt way to do `if (d>max) max=d` — rupps, Dec 07 '14 at 01:19
In terms of runtime and Big-O notation, you have to touch/load and compare each element at least once. Your algorithm does exactly this, so if the complexity comes from loading or comparing, it seems quite optimal. Staying language independent i'd say that in many cases parallelization might help, i.e. on the GPU a parallel reduction operation is efficient. Everything else would be language and/or hardware (compiler) dependent i think. — Thomas, Dec 07 '14 at 01:25
can you parallellize this? how would you deal with the fact that at the end you'd have to somehow cross parallell pipes to find the absolute max? — rupps, Dec 07 '14 at 01:28
@rupps : Using e.g. OpenMP >3.1 you can do this simply by adding `#pragma omp parallel for reduction(max : max)` before the `for`-loop. Therein the second max refers to `double max`. It will automatically cross pipes. — Thomas, Dec 07 '14 at 01:33
Marginal performance improvement can be achieved by setting variable double max = vals[0];, i.e. to the first value and start looping from the second; it should work in any practical implementations (C, C# etc). — Alexander Bell, Dec 07 '14 at 01:54
@rupps: This falls into the category of "embarassingly parallel", since it doesn't matter what order the comparisons are done, at all. You can give the second half of the list to another thread, then when the two threads have found the max in each half, select between them. — Ben Voigt, Dec 07 '14 at 03:09
@Ben totally right sir, it's in fact one of the easiest forms of parallelization. I wasconfused thinking in comparing all elements between them, when only the final results are what matters — rupps, Dec 07 '14 at 03:12

Ben Voigt · Answer 1 · 2014-12-07T03:35:45.557

You can't reduce the complexity below O(n) if the list is unsorted... but you can improve the constant factor by a lot. Use SIMD. For example, in SSE you would use the MAXSS instruction to perform 4-ish compare+select operations in a single cycle. Unroll the loop a bit to reduce the cost of loop control logic. And then outside the loop, find the max out of the four values trapped in your SSE register.

This gives a benefit for any size list... also using multithreading makes sense for really large lists.

score 0 · Answer 2 · answered Dec 07 '14 at 02:52

Assuming the list does not have elements in any particular order, the algorithm you mentioned in your question is optimal. It must look at every element once, thus it takes time directly proportional to the to the size of the list, O(n).

There is no algorithm for finding the maximum that has a lower upper bound than O(n).

Proof: Suppose for a contradiction that there is an algorithm that finds the maximum of a list in less than O(n) time. Then there must be at least one element that it does not examine. If the algorithm selects this element as the maximum, an adversary may choose a value for the element such that it is smaller than one of the examined elements. If the algorithm selects any other element as the maximum, an adversary may choose a value for the element such that it is larger than the other elements. In either case, the algorithm will fail to find the maximum.

score 0 · Answer 3 · edited May 23 '17 at 11:57

0

EDIT: This was my attempt answer, but please look at the coments where @BenVoigt proposes a better way to optimize the expression

You need to traverse the whole list at least once
so it'd be a matter of finding a more efficient expression for if (d>max) max=d, if any.

Assuming we need the general case where the list is unsorted (if we keep it sorted we'd just pick the last item as @IgnacioVazquez points in the comments), and researching a little about branch prediction (Why is it faster to process a sorted array than an unsorted array? , see 4th answer) , looks like

 if (d>max) max=d;

can be more efficiently rewritten as

 max=d>max?d:max;

The reason is, the first statement is normally translated into a branch (though it's totally compiler and language dependent, but at least in C and C++, and even in a VM-based language like Java happens) while the second one is translated into a conditional move.

Modern processors have a big penalty in branches if the prediction goes wrong (the execution pipelines have to be reset), while a conditional move is an atomic operation that doesn't affect the pipelines.

The random nature of the elements in the list (one can be greater or lesser than the current maximum with equal probability) will cause many branch predictions to go wrong.

Please refer to the linked question for a nice discussion of all this, together with benchmarks.

edited May 23 '17 at 11:57

Community

1
1

answered Dec 07 '14 at 02:55

rupps

9,712
4
55
95

...maybe if you turned off optimization. And no, both branches aren't taken with equal probability, unless your data comes from a Brownian process. – Ben Voigt Dec 07 '14 at 03:07
i'm curious about that, why not? if the list is totally unsorted how could the compiler guess which way to favour? Isn't it the case exactly described in the question I linked? – rupps Dec 07 '14 at 03:14
By the time you reach the halfway point, there's a 50% chance that the condition will never be true again. If the max has an equal chance of living in each pigeonhole, then the probability of taking the branch at iteration `i` is `1 / (1 + i)`. Same as the chance that the max of the sublist (0,i) is in the final slot. Gets small, pretty fast (although not fast enough to converge). – Ben Voigt Dec 07 '14 at 03:17
hmmm you're right again ;) ... the question I linked compared with the value in the middle. Anyway do you think substituting the branch for the conditional move would bring a (very marginal) optimization? (something that the compiler can automatically do in the optimization phase, agreed!) – rupps Dec 07 '14 at 03:20
Conditional moves are messy too, often the "best" version of something like this is `greater = -(d > max); max = d&greater | max&~greater;` Conveniently, that works really well with SIMD... at the end of the loop your SIMD register contains the maxes of each column, and then you need to start swizzling and combining columns. But probably the best approach would use saturating arithmetic: `max += saturate(d - max);` – Ben Voigt Dec 07 '14 at 03:24
wow you're a master in the subject, nice to learn something new! – rupps Dec 07 '14 at 03:28
1

Sorry I contradicted myself by saying "best" about two different things. The first is the "best" **generic** ternary operator implementation. The second is peculiar to maximum (or minimum). – Ben Voigt Dec 07 '14 at 03:30

Most effective Algorithm to find maximum of double-precision values

3 Answers3