Smoothing of data with unequal number of observations for plotting?

Question

I have two data-frame with unequal number of rows. But i need to smooth the data in both the data frames and plot them together. I can smooth each dataframe with lowess/loess. However, when i try to plot the lines for both the data-frames together, i usually get error "unequal number of rows". I found a way around this by using spline. I want to know if the following would be valid:

tmp1 <- spline( lowess( df1[,1], df[,2] ), n = 20 )
tmp2 <- spline( lowess( df2[,1], df2[,2] ), n = 20 )

plot( tmp1[,1], tmp1[,2], type="l" )
lines( tmp2[,1], tmp2[,2], col="red" )

I want to know whether it is "statistically" valid to plot spline of a lowess object its its representation, because I want to limit number of data-points. This is specifically for case where the lowess on to different series contain unequal number of points?

Seems like it should be. The key is to make sure the scales for x and y are the same. Using plot() first and then lines() should take care of that. The real question is why you think it might NOT be valid? — IRTFM, Oct 28 '11 at 18:44
Frankly I am not a statistician. I am a biologist, this approach seemed to make sense. But then i have learnt the hard way that what seems right is not necessarily so. Hence this question to make sure that my understanding is correct. I do believe in Community Wisdom. Thanks again @Dwin for your comment. This is the answer i was looking for. — Sam, Nov 03 '11 at 12:57
I am not a certified statistician either, but using loess() certainly seems more statistically "honest" than would forcing a specific polynomial fit. — IRTFM, Nov 03 '11 at 17:34

IRTFM · Accepted Answer · 2011-10-28T21:26:16.347

It would have worked if you (and I) had remembered that splines does not return an object that can be addressed as row and columns. It returns a two element list of vectors. So you need to fix the spelling of the second "df" and use "[[":

# test data
df1 <- data.frame(x=rnorm(100), y=rpois(100, lambda=.5))
df2 <- data.frame(x=rnorm(200), y=rpois(200, lambda=.5))

tmp1 <- spline( lowess( df1[,1], df1[,2] ), n = 20 )
tmp2 <- spline( lowess( df2[,1], df2[,2] ), n = 20 )

plot( tmp1[[1]], tmp1[[2]], type="l" )
lines( tmp2[[1]], tmp2[[2]], col="red" )

That exmple might not have been a good one to get started with, since the ylim needs to be expanded to see any of the points:

 plot( tmp1[[1]], tmp1[[2]], type="l", ylim=c(0,4) )
 lines( tmp2[[1]], tmp2[[2]], col="red" )
 points(jitter(df2[[1]]), df2[[2]],  cex=0.3, col="blue")
 points(jitter( df1[[1]]), df1[[2]], cex=0.3, col="red")

Smoothing of data with unequal number of observations for plotting?

1 Answers1

Linked