2

Suppose I have my posterior draws of Beta using MCMC. For each row, I have a set of betas. Is it possible to turn this row of betas into a model object in R so i can use the predict() function? Specifically, some of the betas are for categorical random variables so If I wanted to apply the betas manually it'll be difficult.

I think to do it manually I will have to turn each categorical variable into multiple columns of indicator variables.

  • Hi, and welcome to SO! When asking a question ([how to ask](https://stackoverflow.com/help/how-to-ask))please provide some data so we can see the layout ([guide on reproducible examples in R](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)). In particular, it's unclear how many categories you have and why they don't have separate betas in the first place. – Calum You Nov 02 '18 at 06:20

1 Answers1

2

You can do this by constructing a model matrix and multiplying it by a coefficient vector.

Suppose this is our model formula. Species is a categorical variable:

> m = Sepal.Length~Petal.Length+Petal.Width+Species

Now create a model matrix from that and the data:

> mm = model.matrix(m, data=iris)
> head(mm)
  (Intercept) Petal.Length Petal.Width Speciesversicolor Speciesvirginica
1           1          1.4         0.2                 0                0
2           1          1.4         0.2                 0                0
3           1          1.3         0.2                 0                0
4           1          1.5         0.2                 0                0
5           1          1.4         0.2                 0                0
6           1          1.7         0.4                 0                0

The Species has three levels, and you can see that all of the first few rows have zeroes for the two species columns, ergo they are of the other species.

To do predictions for some set of 5 coefficients, they need to be in the same order as the rows and with the same "constrast", ie the encoding of species is done the same way with two dummy variables. So your "beta" values have to look the same as the coefficients you'd get from lm on the data:

> mfit = lm(m, data=iris)
> mfit$coefficients
      (Intercept)      Petal.Length       Petal.Width Speciesversicolor 
      3.682982011       0.905945867      -0.005995401      -1.598361502 
 Speciesvirginica 
     -2.112646781 

Now you can get predictions for any set of coefficients by matrix multiplication:

> head(mm %*% mfit$coefficients)
      [,1]
1 4.950107
2 4.950107
3 4.859513
4 5.040702
5 4.950107
6 5.220692

If I've done that right, those values should equal the output from predict:

> head(predict(mfit))
       1        2        3        4        5        6 
4.950107 4.950107 4.859513 5.040702 4.950107 5.220692 

and if your coefficients are different you should get different predictions via matrix multiplying:

> head(mm %*% c(0,1,.2,.2,.4))
  [,1]
1 1.44
2 1.44
3 1.34
4 1.54
5 1.44
6 1.78
Spacedman
  • 92,590
  • 12
  • 140
  • 224