5

I prefer python over R for my work. From time to time, I need to use R functions, and I start to try Rpy2 for that purpose.

I tried but failed to find out how to replicate following with Rpy2

design <- model.matrix(~Subject+Treat)

I have gone as far as this:

import rpy2.robjects as robjects
fmla = robjects.Formula('~subject+treatment')
env = fmla.environment
env['subject'] = sbj_group
env['treatment'] = trt_group

from what I saw here. But I could not find how to perform model.matrix. I tried a couple of different ways:

robjects.r.model_matrix(fmla)
robjects.r('model.matrix(%s)' %fmla.r_repr())

As you can see none of them is right.

I am new to Rpy2, and fairly inexperienced in R. Any help would be appreciated!

Community
  • 1
  • 1
xyliu00
  • 726
  • 1
  • 9
  • 24

1 Answers1

4

You could evaluate strings as R code:

import numpy as np
import rpy2.robjects as ro
import rpy2.robjects.numpy2ri
ro.numpy2ri.activate() 
R = ro.r

subject = np.repeat([1,2,3], 4)
treatment = np.tile([1,2,3,4], 3)
R.assign('subject', subject)
R.assign('treatment', treatment)
R('subject <- as.factor(subject)')
R('treatment <- as.factor(treatment)')
R('design <- model.matrix(~subject+treatment)')
R('print(design)')

yields

   (Intercept) subject2 subject3 treatment2 treatment3 treatment4
1            1        0        0          0          0          0
2            1        0        0          1          0          0
3            1        0        0          0          1          0
4            1        0        0          0          0          1
5            1        1        0          0          0          0
6            1        1        0          1          0          0
7            1        1        0          0          1          0
8            1        1        0          0          0          1
9            1        0        1          0          0          0
10           1        0        1          1          0          0
11           1        0        1          0          1          0
12           1        0        1          0          0          1
attr(,"assign")
[1] 0 1 1 2 2 2
attr(,"contrasts")
attr(,"contrasts")$subject
[1] "contr.treatment"

attr(,"contrasts")$treatment
[1] "contr.treatment"

R(...) returns objects which you can manipulate on the Python side. For example,

design = R('model.matrix(~subject+treatment)')

assigns a rpy2.robjects.vectors.Matrix to design.

arr = np.array(design)

makes arr the NumPy array

[[ 1.  0.  0.  0.  0.  0.]
 [ 1.  0.  0.  1.  0.  0.]
 [ 1.  0.  0.  0.  1.  0.]
 [ 1.  0.  0.  0.  0.  1.]
 [ 1.  1.  0.  0.  0.  0.]
 [ 1.  1.  0.  1.  0.  0.]
 [ 1.  1.  0.  0.  1.  0.]
 [ 1.  1.  0.  0.  0.  1.]
 [ 1.  0.  1.  0.  0.  0.]
 [ 1.  0.  1.  1.  0.  0.]
 [ 1.  0.  1.  0.  1.  0.]
 [ 1.  0.  1.  0.  0.  1.]]

The column names can be accessed with

np.array(design.colnames)
# array(['(Intercept)', 'subject2', 'subject3', 'treatment2', 'treatment3',
#        'treatment4'], 
#       dtype='|S11')
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Thank you! It works as described although not what I originally intended. A follow up question: how do I get 'design' back to python side, in case I want to use it in other statement? I see you use R.assign to pass to R. Is there an equivalent function to pull an object from R? Sorry for my ignorance. Thanks! – xyliu00 Feb 04 '15 at 05:23
  • `R(...)` returns objects which you can manipulate on the Python side. I've edited the post above to show what I mean. – unutbu Feb 04 '15 at 11:23