-1

I am trying to use predict to apply my model to data from one time period to see what might be the values for another time period. I did this successfully for one dataset, and then tried on another with identical code and got the following error:

Error in eval(predvars, data, env) :
  numeric 'envir' arg not of length one

The only difference between the two datasets was that my predictor model for the first dataset had two predictor variables and my model for the second dataset had only one. Why would this make a difference?

My dougfir.csv contains just two columns with thirty numbers in each, labeled height and dryshoot.

my linear model is:

fitdougfir <- lm(dryshoot~height,data=dougfir)

It gets a little complicated (and messy, sorry! I am new to R) because I then made a second .csv - the one I used to make my model contained values from just June. My new .csv (called alldatadougfir.csv) includes values from October as well, and also contains a date column that labels the values either "june" or "october".

I did the following to separate the height data by date:

alldatadougfir[alldatadougfir$date=="june",c("height")]->junedatadougfir
alldatadougfir[alldatadougfir$date=="october",c("height")]->octoberdatadougfir

I then want to use my June model to predict my October dryshoots using height as my variable and I did the following:

predict(fitdougfir, newdata=junedatadougfir)
predict(fitdougfir, newdata=octoberdatadougfir)

Again, I did this with an identical dataset successfully - the only difference was that my model in the successful dataset had two predictor variables instead of the one variable (height) I have in this dataset.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    can you please provide a [reproducible example](http://tinyurl.com/reproducible-000) ? – Ben Bolker Aug 07 '15 at 21:46
  • Another difference is that in the first prediction you use as new data the data that built your model. Is that correct? You mentioned "...use my June model to predict my October dryshoots...". – AntoniosK Aug 07 '15 at 21:49
  • As @BenBolker noted, a reproducible example is needed for the community to help you. – alexwhitworth Aug 07 '15 at 21:57
  • 2
    May be a duplicate question http://stackoverflow.com/questions/9026383/r-numeric-envir-arg-not-of-length-one-in-predict – Whitebeard Aug 07 '15 at 22:01
  • @SamThomas, very nearly, but I think it's worth explaining why ... – Ben Bolker Aug 07 '15 at 22:15

1 Answers1

2

This is essentially a variation on R: numeric 'envir' arg not of length one in predict() , but it might not be obvious why. What's happening is that by selecting a single column of your data frame, you are triggering R's (often annoying/unwanted) default behaviour of collapsing the data frame to a numeric vector. This triggers issue #2 from the linked answer:

The predictor variable needs to be passed in as a named column in a data frame, so that predict() knows what the numbers [it's] been handed represent ... [emphasis added]

Watch this:

dd <- data.frame(x=1:20,y=1:20)
str(dd[dd$x<10,"y"])  ## select some rows and a single column
## int [1:9] 1 2 3 4 5 6 7 8 9

You could specify drop=FALSE, which gives you a data frame with a single column rather than just the column itself:

 str(dd[dd$x<10,"y",drop=FALSE])
 ## 'data.frame':   9 obs. of  1 variable:
 ## $ y: int  1 2 3 4 5 6 7 8 9

Alternately, you don't have to leave out the predictor variable when selecting new data -- R will just ignore it.

str(dd[dd$x<10,])
## 'data.frame':    9 obs. of  2 variables:
##  $ x: int  1 2 3 4 5 6 7 8 9
##  $ y: int  1 2 3 4 5 6 7 8 9
Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Ben, I am so grateful, this is very helpful. Thank you for not only giving me an answer, but also for explaining how it works! – Kira Taylor Aug 08 '15 at 23:05