Predicted values in R when regression has a factor

Question

I have the following regression:

a <- lm(y ~ factor(x) + z + factor(x) * z, data = dataset)

I want to get predicted values for when x = 1, for varying levels of z. I have been struggling to do this with the predict package.

Any help would be greatly appreciated.

Possible duplicate: https://stackoverflow.com/questions/14630056/generating-predicted-values-for-levels-of-factor-variable — MrFlick, Mar 06 '18 at 20:17
When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. When having problems with code, show what you actually tried and describe where the "struggle" is happening. — MrFlick, Mar 06 '18 at 20:19
The results object already includes fitted values for all observations. Is it that you want predicted values for specific combinations? Are you literally typing `x=1` or are you using the factor "value" e.g. x="red"? — Elin, Mar 06 '18 at 20:23

Maurits Evers · Answer 1 · 2018-03-06T20:33:33.567

For future postings, it is good practice to always include sample data. See here how to provide a minimal reproducible example/attempt including sample data.

That aside, here is a simple example based on some sample data I generate.

# Generate sample data
set.seed(2017);
x <- as.numeric(gl(2, 10, 20));
z <- 1:20;
y <- 4 * x + 0.5 * z + rnorm(20);

# Fit model
fit <- lm(y ~ as.factor(x) + z + as.factor(x) * z);
summary(fit);
#    
#Call:
#lm(formula = y ~ as.factor(x) + z + as.factor(x) * z)
#
#Residuals:
#    Min      1Q  Median      3Q     Max
#-1.9283 -0.4702 -0.1270  0.7932  1.6648
#
#Coefficients:
#                Estimate Std. Error t value Pr(>|t|)
#(Intercept)      4.13695    0.79828   5.182 9.08e-05 ***
#as.factor(x)2    5.72079    2.17955   2.625  0.01839 *
#z                0.47615    0.12865   3.701  0.00194 **
#as.factor(x)2:z -0.09588    0.18195  -0.527  0.60544
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 1.169 on 16 degrees of freedom
#Multiple R-squared:  0.9522,   Adjusted R-squared:  0.9432
#F-statistic: 106.3 on 3 and 16 DF,  p-value: 8.896e-11

# Predict for x = 1, and y = 1:5
predict(fit, newdata = data.frame(x = 1, z = 1:5));
#1        2        3        4        5
#4.613097 5.089242 5.565388 6.041533 6.517679

Note that if you want to predict the response based on new values of your predictor variables, you need to supply a newdata data.frame. Otherwise, predict will predict the response based on your original data.

score 0 · Accepted Answer · edited Mar 06 '18 at 21:37

0

a <- lm(y ~ factor(x) + z + factor(x)*z, data=dataset)
df <- data.frame(x = c(1,1,1), z = c(1,2,3))
predict(a, df)

The idea above is to create a data frame with values of X and Z you want to test your model on.

edited Mar 06 '18 at 21:37

Ben Bolker

211,554
25
370
453

answered Mar 06 '18 at 20:36

stlpro

26
3

@ShelbyGrossman if you are satisfied do you mind selecting it as an answer? Thanks! – stlpro Mar 06 '18 at 23:04
I upvoted it, but it seems to not reflect that. How would I go about selecting it as an answer? – Shelby Grossman Mar 07 '18 at 21:13
@ShelbyGrossman "tick" sign below the upvote and downvote arrows/triangles. – stlpro Mar 08 '18 at 22:45

Predicted values in R when regression has a factor

2 Answers2