Plot density using observation weights

Question

Is there a way to plot densities using data that has observation weights?

I have a vector of observations x and a vector of integer weights y, such that y1 indicates how many observations we have of x1. That is, the density of

is equal to the density of 1, 1, 2, 2, 2, 2 ,2 (2x1, 5x2). As far as I understand it, matplotlib.pyplot.hist(weights=y) allow for observation weights when plotting the histogram. Is there any equivalent for computing and plotting the density?

The reason I want the package to be able to do this is that my data is very big, and I'm looking for a more efficient alternative.

Alternatively, I'm open to other packages.

You only need to generate the densities from the observations? — Reut Sharabani, Nov 12 '14 at 22:32
Sorry for the confusion, I want to plot the densities as in http://stackoverflow.com/questions/4150171/how-to-create-a-density-plot-in-matplotlib — FooBar, Nov 12 '14 at 22:36
so as I understand it, you only need to create a list that you call a `histogram` and send it to one of the package suggested. Is your trouble creating that list from observations, or do you have a list and you're having trouble with the package? Or both? — Reut Sharabani, Nov 12 '14 at 22:41
I say that I know functions that allow plotting histograms using observation weights. On the other hand, I'm not aware of functions that allow plotting densities using these weights. I bring the comparison given that densities are somewhat limit cases of histograms. I am not aware of being able to plot densities using histograms. — FooBar, Nov 12 '14 at 22:44
Ahhh now I get it...! Sorry, can't help you too much there :) — Reut Sharabani, Nov 12 '14 at 22:45
see the violin plot in mpl 1.4 and the KDE estimators from scipy. — tacaswell, Nov 13 '14 at 15:16

tozCSS · Answer 1 · 2018-05-30T22:29:36.943

Statsmodels' kde univariate receives weights in its fit function. See the output of the following code.

import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd

df = pd.DataFrame({'x':[1.,2.],'weight':[2,4]})
weighted = sm.nonparametric.KDEUnivariate(df.x)
noweight = sm.nonparametric.KDEUnivariate(df.x)
weighted.fit(fft=False, weights=df.weight)
noweight.fit()

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.plot(noweight.support, noweight.density)
ax2.plot(weighted.support, weighted.density)

ax1.set_title('No Weight')
ax2.set_title('Weighted')

Output:

Note: Your time concern regarding array creation will probably not be resolved with this. Because as noted in the source code:

If FFT is False, then a ‘number_of_obs’ x ‘gridsize’ intermediate array is created

Use `ax1.plot(noweight.support, noweight.density)` to have correct x-axis values. Also, note that the weights need to be a numpy array (or a column in pandas) or you will have the code complaining it can not do `weights.sum()` — fuyas, May 30 '18 at 11:42

Plot density using observation weights

1 Answers1