1

This is a very basic question (I am a novice...). I am trying to test out a simple predict using a linear model but I don't seem to be correctly specifying the dataframe of inputs.

In the call to predict, I keep getting a message that newdata has 12 rows but the variables have 21 rows - I think that is because the inputs variables are not being found. I saw a solution posted earlier for a single input (that suggested to use a vector instead of a data frame) but that does not seem to fix my issue. Any help is greatly appreciated.

################Code is below ######################

# Reading in a csv Text File. Has Headers Quantity, Income, Price and 21 rows of values
CSVData <- read.table("C:/Users/.../CSVInput.txt",header=T,sep=",")

Model=lm(CSVData$Quantity~CSVData$Income+CSVData$Price)

###Creating a new test data set for prediction##################
BindCols1=seq(5,16,by=1)
BindCols2=seq(20,75,by=5)
PredFrame=data.frame(cbind(BindCols1,BindCols2))
colnames(PredFrame) <- c('Income','Price')
colnames(PredFrame)

coef(Model)
pc=predict(Model,PredFrame)

When I run the code, I get a

"Warning message: 'newdata' had 12 rows but variables found have 21 rows"

. Also, it does not use the new inputs (from PredFrame) for the prediction but instead uses the data that was used to fit the model.

Thanks in advance for your suggestions!!!

1 Answers1

3

Use:

Model <- lm(Quantity ~ Income + Price,data=CSVData)

The formula=... argument to lm(...) references column names in the data frame specified in the data=... argument. Once you've done the fit, that information is stored, so you can use predict(Model,newdata=...) and as long as newdata has the same columns as those on the RHS of formula, it will work.

jlhoward
  • 58,004
  • 7
  • 97
  • 140