16

When should the Monte-Carlo method be used?

For example, why did Joel decide to use the Monte-Carlo method for Evidence Based Scheduling instead of methodically processing all user data for the past year?

Gili
  • 86,244
  • 97
  • 390
  • 689
  • what do you mean by 'methodically processing' the user data? random samples is a pretty basic way to add up bell curves, how would you do it? – amwinter May 29 '10 at 11:28
  • 2
    Couldn't you process all available data and come up with the same statistics saying "75% of data is greater than X, 50% of data is greater than Y, and 25% of data is greater than Z"? – Gili May 30 '10 at 17:47

5 Answers5

18

Monte Carlo methods are commonly used when the dimensionality of the problem is too high for traditional schemes. A great introductory paper on the subject is Persi Diaconis' The Markov Chain Monte Carlo Revolution.

jmbr
  • 3,298
  • 23
  • 23
  • Interesting paper, but I quickly got lost in the details. – Gili Jun 23 '10 at 00:02
  • Nice looking paper, but I have to say that to a particle physicist using "Revolution" in the title of a MC paper written after 2009 seems a little strange. We've been doing this for long enough that Metropolis *is* a traditional scheme. – dmckee --- ex-moderator kitten Apr 05 '11 at 23:19
  • @dmckee That paper was aimed, I believe, towards a broad audience of mathematicians comprising not only applied mathematicians or statisticians but also people involved in other areas such as algebra, analysis, etc. where Monte Carlo methods are not so widely known. – jmbr Apr 07 '11 at 15:23
13

Suppose that you want to estimate some quantity of interest. In the Joel's example 'ship date' is what you want to estimate. In most such situations, there are random factors that impact our estimates.

When you have a random quantity, you typically wants to know its mean and the standard deviation so that you can take appropriate actions. In simple situations, you can model the quantity as a standard distribution (e.g., normal distribution) for which analytical formulas exist for the mean and the standard deviation. However, there exist many situations where analytical formulas do not exist. In such situations, instead of an analytic solution for the mean and the standard deviation, we resort to simulation. The idea is:

Step 1: Generate factors that impact the quantity of interest using appropriate distributions

Step 2: Compute quantity of interest

Repeat steps 1 and 2 many times and compute the empirical average and standard deviation for what you want to know.

The above is by far the typical application of monte carlo application. See the wikipedia link provided by Jarrod for several such applications and some examples of interesting applications where there is no inherent randomness (e.g., estimation of pi).

vad
  • 661
  • 1
  • 5
  • 12
  • I like your answer except that the steps you give are very vague. Can you make them more precise somehow? – Gili Jun 23 '10 at 00:04
  • 1
    Well, monte carlo is a vast area with lots of applications. As an example, suppose that you want have some data about various project characteristics (e.g., no of developers, target OS etc) and ship times (e.g., 3 months, 6 months etc). You may already know the relationship between project characteristics and ship times. For example, Ship Times ~ N(mu,sigma^2) I(Ship Times >0) where N(.) indicates a normal distribution, mu and sigma are function of project characteristics and I(Ship Times > 0) expresses the fact that ship times cannot be negative. – vad Jun 23 '10 at 02:10
  • 1
    You may want to know the consequences of changing some project parameter (e.g., increase no of developers) on ship times. Unfortunately, no closed form expression exists for the mean of a truncated normal. So, what you would do is: Step 1: Generate a truncated normal using rejection sampling or inverse transform method Step 2. Store ship time (in this case step 2 involves no computation) Repeat steps 1 and 2 N times and compute the mean and std dev of ship times that you stored in step 2. The above assumes that you know the relationship between the project parameters and mu and sigma. – vad Jun 23 '10 at 02:13
  • 1
    If you do not know that relationship then you would of course need to model that relationship and estimate the relevant parameters. For example, you could assume that mu = beta1 * (No of developers) + beta2 * (No of meetings with clients) etc and estimate beta1, beta2 etc. Hope that helps. – vad Jun 23 '10 at 02:15
3

Wikipedia has a good article on monte carlo simulation methods. I've used monte carlo on a few occasions - in a nutshell MC methods tend to give accurate-ish answers when trying to project results using sample sets that are pretty much random and somebody would typically use intuition to try and guess at a trend. Unfortunately trying to explain MC methods is pretty tough so check out the article.

Jarrod Nettles
  • 6,193
  • 6
  • 28
  • 46
  • If I have data of x game's rating and no of installs. Problem: predict the no of installs for y game Is this the right case to use MC simulation? – Amit Tripathi May 15 '15 at 05:53
3

Sometimes checking all the options is simply prohibitive.

Loren Pechtel
  • 8,945
  • 3
  • 33
  • 45
2

Because the estimates are usually pretty widely distributed when scheduling programming tasks it makes more sense to treat them statistically.

If we take a project which takes 100's of tasks the errors on the estimates will even out and you end up with a distribution which shows the likelihood of project completion as a range.

It also circumvents some serious issues like task buffering and student syndrome skewing the results even further.

Peter Tillemans
  • 34,983
  • 11
  • 83
  • 114