Python: Generate random values from empirical distribution

Question

In Java, I usually rely on the org.apache.commons.math3.random.EmpiricalDistribution class to do the following:

Derive a probability distribution from observed data.
Generate random values from this distribution.

Is there any Python library that provides the same functionality? It seems like scipy.stats.gaussian_kde.resample does something similar, but I'm not sure if it implements the same procedure as the Java type I'm familiar with.

I think the accepted answer [here](http://stackoverflow.com/questions/485076/does-anyone-have-example-code-of-using-scipy-stats-distributions/485233#485233) has what you're looking for. — Kevin, Feb 16 '16 at 14:42
@Kevin: the linked answer doesn't work for this case, because it assumes you already know the analytical form of your distribution, whereas this question is looking for something non-parametric. — abeboparebop, Aug 07 '18 at 13:46

score 7 · Answer 1 · edited Mar 15 '19 at 20:33

7

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

# This represents the original "empirical" sample -- I fake it by
# sampling from a normal distribution
orig_sample_data = np.random.normal(size=10000)

# Generate a KDE from the empirical sample
sample_pdf = scipy.stats.gaussian_kde(orig_sample_data)

# Sample new datapoints from the KDE
new_sample_data = sample_pdf.resample(10000).T[:,0]

# Histogram of initial empirical sample
cnts, bins, p = plt.hist(orig_sample_data, label='original sample', bins=100,
                         histtype='step', linewidth=1.5, density=True)

# Histogram of datapoints sampled from KDE
plt.hist(new_sample_data, label='sample from KDE', bins=bins,
         histtype='step', linewidth=1.5, density=True)

# Visualize the kde itself
y_kde = sample_pdf(bins)
plt.plot(bins, y_kde, label='KDE')
plt.legend()
plt.show(block=False)

new_sample_data should be drawn from roughly the same distribution as the original data (to the degree that the KDE is a good approximation to the original distribution).

edited Mar 15 '19 at 20:33

jonespm

382
2
17

answered Aug 07 '18 at 12:53

abeboparebop

7,396
6
37
46

2

This is not the correct way to draw random sample representing original distribution. A proper method would be some kind of CDF transform. – Zanam May 22 '19 at 15:10
1

@Zanam: what kinds of problems would you expect from this method? I'm no stats expert, so I'm genuinely curious. – abeboparebop May 22 '19 at 15:57
Just the fact that empirical distribution usually never fit any standard distributions that we know of. – Zanam May 22 '19 at 16:03
@Zanam: the only assumption I've made is that the original data distribution can be reasonably fit by a Gaussian-smoothed KDE -- I'm not assuming that it follows any particular standard distribution. – abeboparebop May 27 '19 at 09:04
1

@Zanam can you please elaborate on the point you are trying to make? – Vlad Apr 16 '21 at 15:35

Python: Generate random values from empirical distribution

1 Answers1

Linked

Related