2

I have two variables G and Y that are correlated; each variable has 12 values. I compute the correlation and fit a linear regression model called rg. Now I want to use this model to predict new values for a second variable called GP. I want to get the Y values that correspond with each GP value. GP has 5 values. When I do the prediction I get the following error:

Warning message:
'newdata' had 5 rows but variables found have 12 rows 

How can I apply the model to GP?, Does GP need to have 12 values? I suppose not. Is there any option in predict.lm to do this?

G<-c(20,25,21,30,22,23,19,24,21,23,28,27)
I<-c(229,235,230,242,231,233,226,232,230,232,238,236)

#diagrama de dispersion
qqplot(G,I)

#regression
rg<-lm(I ~ G)
summary(rg)
coef(rg[1])

#coeficiente de correlación
cor(G,I)
cp<-cor(G,I,method = c("pearson"))
cs<-cor(G,I,method = c("spearman"))


 # newdata
GP <- c(30,32,34,36,38)

# predecir el valor de ingresos para estos valores
X1<-data.frame(GP)

Y_pred <- predict.lm(rg,X1 )
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
juanvg1972
  • 133
  • 1
  • 1
  • 6

1 Answers1

2

In order to use the predict method, the names of the newdata data frame need to match the variables in the formula.

G <- c(20,25,21,30,22,23,19,24,21,23,28,27)
I <- c(229,235,230,242,231,233,226,232,230,232,238,236)

Pack the data into a data frame (names are taken automatically from the variable names): it's better practice to use the data argument rather than pulling the values from the global workspace.

dd <- data.frame(G,I)
rg <- lm(I ~ G, data=dd)

New data:

GP <- c(30,32,34,36,38)
pdata <- data.frame(G=GP)  ## same name as in original model

Note that if you don't rename the variable (data.frame(GP)) you'll get a data frame with a single variable pdata$GP, not one containing pdata$G (try it and see) - then R will complain that it can't find the G variable. (Note that predict can be used in much more complex situations where there are a large number of variables ...)

(Y_pred <- predict(rg,pdata))
##       1        2        3        4        5 
## 240.9580 243.4903 246.0227 248.5550 251.0874 

Related (although maybe not exact duplicate): Trouble using predict with linear model in R .

Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • what's different data.frame(G=GP) and data.frame(GP) could you tell briefly .thanks – Beyhan Gul Jun 25 '16 at 23:15
  • The column names of the newdata argument to `predict` need to match the RHS names in the formula used when making the lm-object. `data.frame(GP)` doesn't do that; `data.frame(G=GP)` does. – IRTFM Jun 26 '16 at 01:30