2

I am using numpy.random.lognormal to generate distribution based on mean and std of underlying normal distribution.

np.random.lognormal(mu, sigma, size=50)

My question is: can I truncate the distribution / sample to only include certain values on the lower and upper end of the distribution? Is there a way to specify the min and max of the distribution?

Zephyr
  • 11,891
  • 53
  • 45
  • 80
kms
  • 1,810
  • 1
  • 41
  • 92
  • mean and std are not enough to sample from lognormal(mu, sigma), See https://stackoverflow.com/questions/68575639/lognormal-distribution/68585321#68585321 for details – Severin Pappadeux Aug 09 '21 at 22:59

1 Answers1

0

Your last two sentences are contradictory to me. I assume here you want values inside an interval. If you want them outside an interval write me a comment and I'll correct my code.

Why not just ask for enough lognormal distributed values and take the results that are in your interval?

import numpy as np
import math
import scipy
from scipy.stats import lognorm, binom
from itertools import count

def lognorm_in_interval(mu, sigma, k, loc=0, a=-np.inf, b=np.inf):
    s = sigma
    scale = math.exp(mu)
    dist = lognorm(s, loc, scale)
    p = dist.cdf(b)-dist.cdf(a)
    
    needed_tries = calc_needed_tries(p, k)
    x = dist.rvs(size=needed_tries)
    x = x[(a <= x) & (x <= b)]
    if len(x) >= k:
        return x[:k]
    else:
        np.array([*x, *lognorm_in_interval(mu, sigma, k-len(x), a, b)])
    
def calc_needed_tries(p, k):
    """calculates the amount of i.i.d. tries that are needed to have >= 95% 
    probability of an event with probability p to accur k times"""
    
    assert 0 < p, "this only works for events with positive probability"
    def prop(n): return 1-binom(n,p).cdf(k-1)-0.95
    m = next((m for m in count() if prop(10**m) >= 0))
    sol = scipy.optimize.root_scalar(prop, bracket=[10**(m-1),10**m])
    assert sol.converged
                                   
    return int(math.ceil(sol.root))
        
lognorm_in_interval(0,1,50,a=0.1,b=0.2)
Lukas S
  • 3,212
  • 2
  • 13
  • 25
  • 1
    This is wrong answer. Mean and std values would be different – Severin Pappadeux Aug 11 '21 at 17:52
  • 1
    @SeverinPappadeux it says explicitly in the question that it should be "based on mean and std of underlying normal distribution". Not the resulting mean and standard std. Also if you think there is a problem it would be nice if you were more explicit on what you think it is and how it can be fixed. – Lukas S Aug 11 '21 at 20:45