How to decide if randomised algorithm is OK to use?

Question

From what I understand, randomised algorithm could give wrong answer.For example, using contraction algorithm to solve graph min-cut problem, you need to run the algorithm n^2*ln(n) times so that the possibility of failing to get the correct answer is at most 1/n. No matter how small the possibility of failure is, the answer could be incorrect, so when is the right time that we allow the incorrect answer?

This is a really interesting question, but I think it might be a bit too open-ended for Stack Overflow. This depends a lot on what the cost of getting a wrong or slow answer is. In a nuclear reactor, you can't afford wrong answers in safety-critical systems. In a database, it's okay if things are sometimes a bit slower than usual. — templatetypedef, Nov 06 '16 at 19:37
I think is fine. It's a real programming/engineering question: How to analyze optimality criteria to determine whether a randomized algorithm is usable for a given problem. — Gene, Nov 06 '16 at 21:11
"_when is the right time that we allow the incorrect answer?_". Define "incorrect". Define your tolerance to incorrectness (and this really depends on the problem at hand, as mentioned by @templatetypedef but also on the context and on your system of values ...). If you can guarantee that the incorrectness lies within your tolerance, then it's "ok" to use a randomized algorithm. — user2314737, Nov 07 '16 at 09:19

score 0 · Accepted Answer · edited May 23 '17 at 12:01

To begin with, I think you need to differentiate between different classes of randomized algorithms:

A Monte Carlo algorithm is an algorithm which is random w.r.t. correctness. The randomized min-cut algorithm, from your question, is an example of such an algorithm.
A Las Vegas algorithm is an algorithm which is random w.r.t. running time. Randomized quicksort, for example, is such an algorithm.

You seem to mean Monte-Carlo algorithms in your question.

The question of whether a Monte-Carlo algorithm is suitable to you, probably can't be answered objectively, because it is based on something like the ecomonic theory of utility. Given two algorithms, A and B, then each invocation of A or B takes some time t and gives you the result whose correctness is c. The utility U(t, c) is a random variable, and only you can determine whether the distribution of U_A(T, C) is better or worse than U_B(T, C). Some examples, where algorithm A performs twice as fast as B, but errs with probability 1e-6:

If these are preference recommendations on a website, then it might be worth it for you to have your website twice as responsive as that of a competitor, at the risk that, rarely, a client gets wrong recommendations.
If these are control systems for a nuclear reactor (to borrow from TemplateTypedef's comment), then a slight chance of failure might not be worth the time saving (e.g., you probably would be better investing in a processor twice as fast running the slower algorithm).

The two examples above show that each of the two choices might be correct for different settings. In fact, utility theory rarely shows sets of choices that are clearly wrong. In the introduction to the book Randomized Algorithms by Motwani and Raghavan, however, the authors do give such an example for the fallacy of avoiding Monte-Carlo algorithms. The probability of a CPU malfunctioning due to cosmic radiation is some α (whose value I forget). Thus avoiding running a Monte-Carlo algorithm with probability of error much lower than α, is probably simply irrational.

Gene · Answer 2 · 2016-11-06T22:34:34.487

You'll always need to analyze the properties of the algorithm and decide if the risk of a non-optimal answer is bearable in your application. (If the answer is Boolean, then "non-optimal" is the same as "wrong.")

There are many kinds of programming problems where some answer that's close to optimal and obtained in reasonable time is much better than the optimal answer provided too late or not at all.

The Traveling Salesman problem is an example. If you are Walmart and need to plan delivery routes each night for given sets of cities, getting a route that's close to optimal is much better than no route or a naively chosen one or the best possible route obtained 2 days from now.

There are many kinds of guarantees provided by randomized algorithms. They often have the form error <= F(cost), where error and cost can be almost anything. The cost may be expressed in run time or how many repeat runs are spent looking for better answers. Space may also figure in cost. The error may be probability of a wrong 1/0 answer, a distance metric from an optimal result, a discrete count of erroneous components, etc., etc.

Sometimes you just have to live with a maybe-wrong answer because there's no useful alternative. Primality testing on big numbers is in this category. Though there are polynomial time deterministic tests, they are still much slower than a probabilistic test that produces the correct answer for all practical purposes.

For example, if you have a Boolean randomized function where True results are always correct, but False are wrong 50% of the time, then you are in pretty good shape. (The Miller-Rabin primality test is actually better than this.)

Suppose you can afford to run the algorithm 40 times. If any of the runs says False, you know the answer is False. If they're all True then the probability of that the real answer if false is roughly 2^40 = 1/(1 trillion).

Even in safety-critical applications, this may be a fine result. The chance of being hit by lightning in a lifetime is about 1/10,000. We all live with that and don't give it a second thought.

How to decide if randomised algorithm is OK to use?

2 Answers2