3

I have two data-frame with unequal number of rows. But i need to smooth the data in both the data frames and plot them together. I can smooth each dataframe with lowess/loess. However, when i try to plot the lines for both the data-frames together, i usually get error "unequal number of rows". I found a way around this by using spline. I want to know if the following would be valid:

tmp1 <- spline( lowess( df1[,1], df[,2] ), n = 20 )
tmp2 <- spline( lowess( df2[,1], df2[,2] ), n = 20 )

plot( tmp1[,1], tmp1[,2], type="l" )
lines( tmp2[,1], tmp2[,2], col="red" )

I want to know whether it is "statistically" valid to plot spline of a lowess object its its representation, because I want to limit number of data-points. This is specifically for case where the lowess on to different series contain unequal number of points?

Sam
  • 7,922
  • 16
  • 47
  • 62
  • 1
    Seems like it should be. The key is to make sure the scales for x and y are the same. Using plot() first and then lines() should take care of that. The real question is why you think it might NOT be valid? – IRTFM Oct 28 '11 at 18:44
  • Frankly I am not a statistician. I am a biologist, this approach seemed to make sense. But then i have learnt the hard way that what seems right is not necessarily so. Hence this question to make sure that my understanding is correct. I do believe in Community Wisdom. Thanks again @Dwin for your comment. This is the answer i was looking for. – Sam Nov 03 '11 at 12:57
  • I am not a certified statistician either, but using loess() certainly seems more statistically "honest" than would forcing a specific polynomial fit. – IRTFM Nov 03 '11 at 17:34

1 Answers1

1

It would have worked if you (and I) had remembered that splines does not return an object that can be addressed as row and columns. It returns a two element list of vectors. So you need to fix the spelling of the second "df" and use "[[":

# test data
df1 <- data.frame(x=rnorm(100), y=rpois(100, lambda=.5))
df2 <- data.frame(x=rnorm(200), y=rpois(200, lambda=.5))

tmp1 <- spline( lowess( df1[,1], df1[,2] ), n = 20 )
tmp2 <- spline( lowess( df2[,1], df2[,2] ), n = 20 )

plot( tmp1[[1]], tmp1[[2]], type="l" )
lines( tmp2[[1]], tmp2[[2]], col="red" )

That exmple might not have been a good one to get started with, since the ylim needs to be expanded to see any of the points:

 plot( tmp1[[1]], tmp1[[2]], type="l", ylim=c(0,4) )
 lines( tmp2[[1]], tmp2[[2]], col="red" )
 points(jitter(df2[[1]]), df2[[2]],  cex=0.3, col="blue")
 points(jitter( df1[[1]]), df1[[2]], cex=0.3, col="red")
IRTFM
  • 258,963
  • 21
  • 364
  • 487