5

I am trying to smooth the following data using python gaussian_kde however it is not working properly, it looks like the kde it is resampling for the distribution for the whole dataset instead of using a bandwidht for each point and giving the weights to do the smoothing

from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np
y=[ 191.78 ,   191.59,    191.59,    191.41,    191.47,    191.33,    191.25  \
  ,191.33 ,   191.48 ,   191.48,    191.51,    191.43,    191.42,    191.54    \
  ,191.5975,  191.555,   191.52 ,   191.25 ,   191.15  ,  191.01  ]
x = np.linspace(1 ,20,len(y))
kde= gaussian_kde(y)
kde.set_bandwidth(bw_method=kde.factor / 3)

fig, ax = plt.subplots(figsize=(10, 10))
ax.legend(loc='center left', bbox_to_anchor=(1.05, 0.5), frameon=False)
ax.scatter(x, y, color='black', label='data')
ax.plot(x,y,color='red')
ax.plot(x,kde(x))

Here it is the chart of the data

Chart of the data without smoothing

You can notice that the chart it is not smoothing the line

Chart after smoothing

Erick Gomez
  • 83
  • 1
  • 1
  • 6
  • I forgot to mention this is the expected result http://content.screencast.com/users/lestat_ega/folders/Jing/media/c91770e0-fd53-4082-af6c-4a2774163371/smooth2.png – Erick Gomez Oct 02 '15 at 05:16
  • Why do you not set `bw_method` in the constructor as described in the docs? – cel Oct 02 '15 at 05:51
  • Also `kde` calculates a probability density and does not smooth data. – cel Oct 02 '15 at 05:58
  • Take a look at `help(gaussian_kde)`, there is an example that seems to work. Then step by step make one (or a few) modification to make it look more and more like your program. When it stops working, you've found what's wrong *and* how to fix it. – uhoh Oct 02 '15 at 06:25

1 Answers1

5

You are thinking that the kde_gaussian smooths a line, but what it is actually doing is smoothing the density distribution estimate of a dataset. Your data isn't a dataset like that, it's x/y coordinates.

Here are some examples of ways of smoothing linear data:

#from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import numpy as np

from scipy import interpolate

from scipy import ndimage

y=[ 191.78 ,   191.59,    191.59,    191.41,    191.47,    191.33,    191.25  \
  ,191.33 ,   191.48 ,   191.48,    191.51,    191.43,    191.42,    191.54    \
  ,191.5975,  191.555,   191.52 ,   191.25 ,   191.15  ,  191.01  ]
x = np.linspace(1 ,20,len(y))

# convert both to arrays
x_sm = np.array(x)
y_sm = np.array(y)

# resample to lots more points - needed for the smoothed curves
x_smooth = np.linspace(x_sm.min(), x_sm.max(), 200)

# spline - always goes through all the data points x/y
y_spline = interpolate.spline(x, y, x_smooth)

spl = interpolate.UnivariateSpline(x, y)

sigma = 2
x_g1d = ndimage.gaussian_filter1d(x_sm, sigma)
y_g1d = ndimage.gaussian_filter1d(y_sm, sigma)

fig, ax = plt.subplots(figsize=(10, 10))
ax.legend(loc='center left', bbox_to_anchor=(1.05, 0.5), frameon=False)

plt.plot(x_sm, y_sm, 'green', linewidth=1)
plt.plot(x_smooth, y_spline, 'red', linewidth=1)
plt.plot(x_smooth, spl(x_smooth), 'yellow', linewidth=1)
plt.plot(x_g1d,y_g1d, 'magenta', linewidth=1)

plt.show()

The plot looks like this:

enter image description here

Green is your original data, red is the spline, yellow is the UnivariateSpline and magenta is the gaussian_1d filtered data. If you lookup these functions there may be parameters like sigma that you can vary to further smooth the data, possibly. Have a google for the documentation.

  • I found this really helpful!! Would there be a way to automatically locate the peaks? – GeoMonkey Feb 09 '16 at 00:41
  • The y coordinates are in the various plt.plot calls so you could find the inflections numerically - i.e. by comparing successive values and looking for the change of direction. – DisappointedByUnaccountableMod Feb 10 '16 at 07:06
  • `scipy.interpolate.spline()` no longer seems to exist. Here's a (newer?) version that seems to work: https://stackoverflow.com/a/31544486/1386750 – AstroFloyd Mar 14 '20 at 13:54