5

Does anyone have suggestions for efficiently truncating the SciPy random distributions. For example, if I generate random values like so:

import scipy.stats as stats
print stats.logistic.rvs(loc=0, scale=1, size=1000)

How would I go about constraining the output values between 0 and 1 without changing the original parameters of the distribution and without changing the sample size, all while minimizing the amount of work the machine has to do?

TimY
  • 5,256
  • 5
  • 44
  • 57

3 Answers3

8

Your question is more of a statistics question than a scipy question. In general, you would need to be able to normalize over the interval you are interested in and compute the CDF for this interval analytically to create an efficient sampling method. Edit: And it turns out that this is possible (rejection sampling is not needed):

import scipy.stats as stats

import matplotlib.pyplot as plt
import numpy as np
import numpy.random as rnd

#plot the original distribution
xrng=np.arange(-10,10,.1)
yrng=stats.logistic.pdf(xrng)
plt.plot(xrng,yrng)

#plot the truncated distribution
nrm=stats.logistic.cdf(1)-stats.logistic.cdf(0)
xrng=np.arange(0,1,.01)
yrng=stats.logistic.pdf(xrng)/nrm
plt.plot(xrng,yrng)

#sample using the inverse cdf
yr=rnd.rand(100000)*(nrm)+stats.logistic.cdf(0)
xr=stats.logistic.ppf(yr)
plt.hist(xr,density=True)

plt.show()
Neapolitan
  • 2,101
  • 9
  • 21
user1149913
  • 4,463
  • 1
  • 23
  • 28
  • two issues: instead of integrate.quad you can directly use logistic.cdf, and the lst = xr[yr – Josef Jul 15 '12 at 22:57
  • Yes, those are both good points, but actually it turns out there is a much better solution anyway... see edits. – user1149913 Jul 16 '12 at 01:47
  • Logistic has a nice expression for the ppf and transforming a uniform random variable with the ppf is much better, but I liked your recipe for rejection sampling for the case when the ppf is expensive to calculate. – Josef Jul 16 '12 at 08:40
  • Is it possible to this with defined parameters? I dont understand where the parameters of the disritbution can be changed as in de logistic.rvs() function. – Uis234 May 20 '16 at 08:16
0

What are you trying to achieve? Logistic distribution by definition has infinite range. If you truncate the results in any way, their distribution will change. If you just wanna random numbers in range, there's random.random().

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
  • I used the logistic simply as an example, but there are situations where a real-world distribution will be almost identical to a theoretical one, but due to certain external constraints, cannot realistically go above a certain value. Truncating, can in many cases just add a tiny error which can be deemed negligible for modelling. If you're not convinced, I think it is perhaps best to see this as simply a theoretical exercise. – TimY Jul 15 '12 at 10:52
0

You could normalise your results to the maximum returned value:

>>> dist = stats.logistic.rvs(loc=0, scale=1, size=1000)
>>> norm_dist = dist / np.max(dist)

This will keep the 'shape' the same, and the values between 0 and 1. But if you're doing repeated draws from a distribution, be sure to normalise all the draws to the same value (max from all draws).

However, you want to be pretty careful if your doing this kind of thing that it makes sense within the context of what you are trying to achieve (which I don't have enough info to comment on...)

TimY
  • 5,256
  • 5
  • 44
  • 57
fraxel
  • 34,470
  • 11
  • 98
  • 102
  • I'm very sorry - I was not clear (I updated the question). I did not mean "shape," I meant "original parameters." Also, I think this (for some distributions) may have the same effect as changing the scale parameter. – TimY Jul 15 '12 at 10:59