4

I have a set of weighted x,y points, like shown below (the full set is here):

#  x       y     w
-0.038  2.0127  0.71
0.058   1.9557  1
0.067   2.0016  0.9
0.072   2.0316  0.83
...

I need to find a smoothed line that adjusts these points according to the importance assigned to each, ie: more weight means the data point should have more relevance.

This is the code I have so far, which basically applies a gaussian_filter1d to the data (I got the idea from this question: line smoothing algorithm in python?):

import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage import gaussian_filter1d

# Read data from file.
data = np.loadtxt('data_file', unpack=True)
x, y, w = data[0], data[1], data[2]

# Return evenly spaced numbers over a specified interval.
t = np.linspace(0, 1, len(x))
t2 = np.linspace(0, 1, 100)    
# One-dimensional linear interpolation.
x2 = np.interp(t2, t, x)
y2 = np.interp(t2, t, y)

# Obtain Gaussian filter with fixed sigma value.
sigma = 7
x3 = gaussian_filter1d(x2, sigma)
y3 = gaussian_filter1d(y2, sigma)

# Make plot.
cm = plt.cm.get_cmap('RdYlBu')
plt.scatter(x, y, marker="o", c=w, s=40, cmap=cm, lw=0.5, vmin=0, vmax=1)
plt.plot(x3, y3, "r", lw=2)
plt.show()

This code produces the following plot (bluer dots have a higher weight value):

plot

The problem is that this fit does not consider the weights assigned to each point. How can I introduce that information into the gaussian filter?

Community
  • 1
  • 1
Gabriel
  • 40,504
  • 73
  • 230
  • 404
  • 1
    Many of the scipy interpolations allow for weights to be specified. http://docs.scipy.org/doc/scipy/reference/interpolate.html – tom10 Sep 26 '13 at 14:37
  • @tom10 would you mind pointing me to a specific one? For what I've seen the UnivariateSpline function has weights but it requires the x array to be increasing, a condition my data does not meet. – Gabriel Sep 26 '13 at 15:07
  • 1
    Are you aware that the numpy.interp docs say: "xp: The x-coordinates of the data points, must be increasing." and "Does not check that the x-coordinate sequence xp is increasing. If xp is not increasing, the results are nonsense." So, for either one, it might be best to start with a sort (using numpy.argsort so you can do all axes consistently with x). – tom10 Sep 26 '13 at 15:31
  • @tom10 I'm not sure I follow you. If I sort my data that will affect the shape of the cloud of points I'm trying to fit/interpolate. Could you expand a bit what you mean in the form of an answer? Thank you. – Gabriel Sep 26 '13 at 18:04
  • 1
    Unfortunately I don't have scipy installed on any computer that I'll have access to for awhile, so I can't do a full example. But, in words: sorting should not change the shape of the cloud, it just means that the x-values are in order. That is, if you have a bunch of (x,y) pairs, the *values* in each of these pairs creates the shape of the cloud, and you just need to order these by their x-values. For separate x, y, w arrays then, you need to sort all of x, y, and w **by x**: eg, ysorted = y[np.argsor(x)], ie, the pairs are the same, but reordered. – tom10 Sep 26 '13 at 18:37
  • @tom10 Oh I see what you mean now! I'll fix the question as soon as I figure out how to sort the arrays as you mention. Thank you very much for pointing that out, it had completely slipped by me! – Gabriel Sep 27 '13 at 20:33

1 Answers1

5

Note that the following idea is workaround not an exact solution, but it is worth to try.

The idea is to use w weight parameter to repeat corresponding values in x and y. So if you scale w for example into range [1,10] all corresponding values in x and so in y will be duplicated 10 times for w equal to 10. That is, new x, y will be created. In this way we incorporate the weight as frequency of values in x and y, indeed. Having this done, feeding the new ones to your algorithm hopefully gives you desired results as shown in the worked examples below.

  • For the first figure, blue-to-red spectrum correspond to lower-to-high weights. Numbers of title are the duplicating factor as described above.
  • For the second figure, your data, we didn't touch your color-format.

enter image description here

enter image description here

Developer
  • 8,258
  • 8
  • 49
  • 58
  • Although this is not exactly what I asked for it seems to fit what I need so I'm marking it as accepted. Thank you very much @Developer! – Gabriel Oct 05 '13 at 00:33