8

I'm looking to sum up an arbitrary number of probabilistic distributions of things using a montecarlo type simulation. I'd like to randomly sample continuous distributions of something and add them to other random samples of other continuous distributions, ultimately getting a probability distribution for their combination. The distributions themselves are empirical - they aren't a function but in the form of P99 = 2.4, P90 = 7.12, P50 = 24.53, P10 = 82.14 and so on (in reality there are a bunch of those points). The distributions are more or less lognormal, so approximating them as lognormal would probably be fine, if that's necessary. But how could I enter that into SciPy's lognorm function? Or do it some other way in SciPy, or python in general?

I hope it's clear what I'm trying to do. Thanks a lot, Alex

Alex S
  • 4,726
  • 7
  • 39
  • 67

1 Answers1

3

It looks like what you have is essentially a histogram of the probability density. One thing you can do then is to use the inverse transform sampling with your empirical distribution.

As an alternative, if you expect a certain functional form of a distribution (lognorm or some other one), you can try fitting the data with the corresponding functional form.

ev-br
  • 24,968
  • 9
  • 65
  • 78
  • Hmm well I do expect them to be close to lognormal, do you know how I'd go about fitting the data to that? I was left confused with how to make a lognormal distribution fit through two given points at [the description page](http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html#scipy.stats.lognorm). If I had a P90 that was X and a P10 that was Y, how would I make a lognormal distribution that fits the two? – Alex S Apr 18 '13 at 20:33
  • I'm not sure about this two-point thing you're talking about. What I'd do, I'd first check if the distribution actually *is* log-normal. For this, I'd use the fact that for a normal distribution the moments should be related in a very precise way (cf wikipedia or mathworld or elsewhere), and that if `X` is log-normal, then `log X` is normally distributed: just calculate several first moments of the `log X`. – ev-br Apr 18 '13 at 20:55
  • Ok, but don't I have to load it into the scipy object so that I can then get random samples from it using `R = lognorm.rvs(s, size=100)` as mentioned in the lognorm function page above? – Alex S Apr 19 '13 at 18:57
  • Yes. Later on, when you've convinced yourself that your distribution is lognorm, and have estimated it's parameters, yes, indeed. – ev-br Apr 20 '13 at 16:05
  • Right, but that's what I don't know how to do. Lets assume I do have lognorm data (some times I will for sure), how do I load those into the scipy object? Do I need to calculate an `x` and an `s` to load it, and if so how to I calculate those given that all I have are the sets of numbers I mention above? Maybe the "two point" thing plays into this - when plotted on a semi-log plot, the cumulative probability function appears as a straight line. I know more than enough points on that line to fit a formula through it. But how do I get that into SciPy? – Alex S Apr 20 '13 at 18:07
  • @Zhenya Even if this answer wasn't accepted thank you for your insight!! – Greg May 13 '13 at 23:31