0

I have a big dataset with 100853 observations. I wish to determine the relationship between the 2 variables in my model i.e. log of per capita expenditure (ln_MPCE) and share of expenditure spent on food (w_food). To do this,I run a quadratic regression and a non-parametric regression. Then, I plot the data and the fitted values using the following code. However, the graphs are just not plotted right. Instead of getting 2 curves, I get a bunch of lines for both the regressions. Please tell me where I am going wrong. Thanks in advance for your help.

model.par <- lm(w_food~ ln_MPCE+ I(ln_MPCE^2), data=share_efm_food_09)
summary(model.par) 
library(np) 
model.np <- npreg(w_food~ ln_MPCE, regtype="ll",bwmethod="cv.aic",data=share_efm_food_09)

pdf("food_Ln_MPCE_curve.pdf" , width=11, height=8)
plot(share_efm_food_09$ln_MPCE, share_efm_food_09$w_food, xlab="ln_MPCE",ylab="w_food", cex=.1)
lines(share_efm_food_09$ln_MPCE, fitted(model.np), lty=1, col="blue")
lines(share_efm_food_09$ln_MPCE, fitted(model.par), lty=1, col="red")
dev.off() 
Ridhima
  • 177
  • 3
  • 18
  • 2
    You should attempt to provide some sort of minimal, [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that include sample input data so we can run the code to see what's happening. – MrFlick May 16 '16 at 15:40

1 Answers1

2

What's happening is that the data are not sorted by the x-value, so that lines go back and forth, depending on where the next x-value happens to be in the current ordering of your data frame. Order the data frame by the x-value to get the line you were expecting.

Here's an example with the built-in mtcars data frame:

m1 = lm(mpg ~ wt + I(wt^2), data=mtcars)

Plot data in default order:

with(mtcars, plot(wt, mpg))
lines(mtcars$wt, fitted(m1), col="blue")

enter image description here

Add a prediction line with data sorted by wt:

newdat = data.frame(wt=mtcars$wt, mpgpred=fitted(m1))
newdat = newdat[order(newdat$wt),]

lines(newdat, col="red", lwd=4)

enter image description here

Rather than using fitted, you can also use predict, which will return predicted values from your model for any combination of values of the independent variables. You can then provide the original data frame sorted by wt:

m1 = lm(mpg ~ wt + I(wt^2), data=mtcars)

with(mtcars, plot(wt, mpg))
lines(mtcars$wt[order(mtcars$wt)], predict(m1, newdata=mtcars[order(mtcars$wt),]), col="red")
eipi10
  • 91,525
  • 24
  • 209
  • 285