Using the predict() function with 2 or more explanatory variables

Question

I have a built some models using lm(). The response variable is the abundance of a species at one of two locations each month. It is given as a percentage to 6 decimal place. Percentages have to be used as the data was collected via citizen science where the actual monthly total recorded each month is not reliable but the overall proportion (%) at each of the two locations is.

The best fit model has two explanatory variables which are wind speed and wind direction, both numerical. I would like to apply the predict() function. So far, I have been able to do this by following the instructions from the post here as shown below.

model <- lm(y~ x1, data=df)
new.df <- data.frame(x1=c(0, 10, 20))
predict(model, new.df)

This seems to work well for models with just a single exploratory variable but I am having trouble adding a second so it works on my best fit model.

So far, this is what I have come up with, however, the results do not make sense as two are negative numbers.

model2 <- lm(y ~ x1+x2, data=df)
new.df <- data.frame(x1=c(1, 6, 12), (x2=c(1, 10, 20)))
predict(model2, new.df)

 1          2          3 
 0.4123114 -0.3975497 -1.3014379

I would be grateful if anyone could offer any suggestions.

the `lm` model does not know your `y` can not be negative so given some plausible `x` combination it can predict negative `y`. What you are looking for most likely is `glm` with `family = poisson`. — missuse, Mar 30 '18 at 11:47
I can't see anything wrong with your code; it's hard to say more with you not providing details about `df` and/or the model fit. What is the quality of the fitted linear model? Please review how to provide a [minimal reproducible example/attempt](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including sample data. — Maurits Evers, Mar 30 '18 at 11:48
Can't really help without some sample of your input data, but @missuse is correct. `lm` appears to be working correctly. If negative values are impossible in your data, you'll need to specify a `glm` linking function that is more suited to your data. — jdobres, Mar 30 '18 at 11:49
Thank you for the feedback. I have updated my question and I hope it is sufficient. I began with 'glm' but I was having difficulty because the response variable is a percentage with 6 decimal place. I think I may have to look for a different method to predict how these explanatory variables may affect the response. — Jo Harris, Mar 30 '18 at 12:20

score 0 · Answer 1 · answered Mar 30 '18 at 13:21

0

if you need x1 + x2 and the interaction of both (y ~ x1 +x2 +x1:x2) try this:

> df <- data.frame(x1=c(2, 12, 24), x2=c(2, 20, 40), y=c(1,2,3)) # example DF

> model2 <- lm(y ~ x1*x2, data=df)
> new.df <- data.frame(x1=c(1, 6, 12), (x2=c(1, 10, 20)))
> predict(model2, new.df)
  1   2   3 
1.0 1.5 2.0

answered Mar 30 '18 at 13:21

Lorenzo Negri

51
5

That's perfect, thank you. I am sorry to admit; I found a silly error in my script. The x2 in my csv. is in decimals, not whole numbers. I have now adjusted to 0.1, 0.2 and so on and it seems to be working. – Jo Harris Mar 30 '18 at 16:33

score 0 · Answer 2 · answered Mar 31 '18 at 08:02

0

I found the problem. My response variable had been transformed to ensure assumptions were satisfied. Therefore, the output from predict() returned values in their transformed state.

answered Mar 31 '18 at 08:02

Jo Harris

98
9

Using the predict() function with 2 or more explanatory variables

2 Answers2