0

I have a dataset where I have observations for three years (e.g., 2000, 2005, and 2010) and need to interpolate the values for the years in-between using R. I have attempted to use some type of spline to do this, however, the interpolated values are outside of the original range. In the case below they even become negative.

years <- c(2000, 2005, 2010)
outcome_values <- c(1, 10, 90)
plot(spline(years, outcome_values, xout = seq(min(years), max(years))))
points(years, outcome_values, pch = 16)

plot output

Someone described this situation and a solution in Python using a lower order spline (Smooth curved line between 3 points in plot and interpolate curve between three values), but I have not been able to figure out how to do this in R. Any pointers would be appreciated.

Bryan
  • 1,771
  • 4
  • 17
  • 30
  • so.. you need linear interpolation? – Wimpel Jun 19 '20 at 13:51
  • I'm not sure how to approach this question - you have interpolated a smooth curve between 3 data points using a cubic spline. Looks like it worked great. There's nothing about cubic splines that would force interpolated values to be positive, or to be within the original range. If you used this R method on the data in the Python questions you link, you would get the same (or close) results, and if you used the Python methods in this data, they would provide the same (or close) results to what you get in R. So it seems like your question stems from not understanding the method. – Gregor Thomas Jun 19 '20 at 13:52
  • Splines are just a form of interpolation. They have no knowledge of whether the numbers they come up with are "reasonable" or not: they just appliy their algorithm. Different spline functions come up with different estimates because they use different algorrithms. The fact that in one specific case one spline came up with "reasonable" estimates and another did not doesn't make the first method right and ther second method wrong. Your options might include using different methods, modifying the parameters of a given method or "transform your data, apply the spline and back transform". – Limey Jun 19 '20 at 13:53
  • 2
    Overall, seems like maybe you should go to stats.stackexchange and ask a methodological question about what interpolation method to use. If you want to require everything to be positive, a common approach generally would be to log the raw data, interpolate on the log scale, and exponentiate the result. Maybe that's enough for you here? – Gregor Thomas Jun 19 '20 at 13:54

2 Answers2

1

Here's how to do it with a log transform on the outcome. This will guaranteed interpolated values are positive, and change the shape of the curve in a way you might like.

years = c(2000, 2005, 2010)
outcome_values= c(1, 10, 90)

sp = spline(years, log(outcome_values), xout = seq(min(years), max(years), length.out = 10))
plot(sp$x, exp(sp$y))
points(years, outcome_values, pch = 16)

enter image description here

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
0

You can lower the degree of the spline, but this won't solve your problem. It is the nature of your data that causes negative estimates:

library(splines)

years <- c(2000, 2005, 2010)
outcome_values <- c(1, 10, 90)

# quadratic B-basis spline
fit2 <- lm(outcome_values ~ bs(years, degree = 2))

plot(years, outcome_values, pch = 16)
lines(2000:2010, predict(fit2, data.frame(years = 2000:2010)), col = "blue")

That a spline results in negative predictions does not mean anything is wrong with this spline. You should use linear interpolation.

slava-kohut
  • 4,203
  • 1
  • 7
  • 24