dynamically calling R library from python using rpy2

Question

based on https://stackoverflow.com/a/44827220/1639834:

I have an R routine that I need to call from my python code in a dynamic way. For this I intended to use rpy2.

First the R code I would like to make use of from python (first time R user):

setting up dummy data to showcase R routine usage

 set.seed(101)
 data_sample <- c(5+ 3*rt(1000,df=5),
        10+1*rt(10000,df=20))

 num_components <- 2

the routine itself

library(teigen)
 tt <- teigen(data_sample,
        Gs=num_components,  
        scale=FALSE,dfupdate="numeric",
        models=c("univUU") 
 )

df = c(tt$parameters$df)
mean = c(tt$parameters$mean)
scale = c(tt$parameters$sigma)

The arguments data_sampleand num_componentsare computed dynamically by my python code where num_componentsit just an integer and data_sample a numpy array.

As end-goal I would like to have df, meanand scale back in "python world" as lists or numpy arrays to further process them and use them down the road in my program logic.

My first experiment to tackle this with rpy2 so far:

import rpy2
from rpy2.robjects.packages import importr
from rpy2 import robjects as ro

numpy_t_mix_samples = get_student_t_data(n_samples=10000)

r_t_mix_samples = ro.FloatVector(numpy_t_mix_samples)

teigen = importr('teigen')
rres = teigen.teigen(r_t_mix_samples, Gs=2, scale=False, dfupdate="numeric", models=c("univUU"))

Here the argument for Gsare still hardcoded but should as laid out above later be dynamic.

rres then prints mostly incomprehensible output (i gues because it is not being casted yet properly with rpy2):

R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
  iter: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x11e3fdd08 / R:0x7ff7cced0a28>
[156.000000]
  fuzzy: <class 'rpy2.robjects.vectors.Matrix'>
  R object with classes: ('matrix',) mapped to:
<Matrix - Python:0x11e3fd8c8 / R:0x118e78000>
[0.000000, 0.917546, 0.004050, ..., 0.077300, 0.076273, 0.091252]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
  ...
  iter: <class 'rpy2.robjects.vectors.FloatVector'>
  R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x11d632508 / R:0x7ff7cfa81658>
[-25365.912426]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]
R object with classes: ('teigen',) mapped to:
<ListVector - Python:0x11e3fdc48 / R:0x7ff7d229dcb0>
[Float..., Matrix, ListV..., ..., Float..., ListV..., ListV...]

All in all I am looking to have the same results as in the original R example in the first code box, just that the df, mean and scale variables are python lists/numpy arrays. The fact that I don't know R at all makes using rpy2 quite difficult and maybe there is more elegant way to call this routine dynamically and get the results back in python world.

You might be getting exactly what you need, you just need to traverse the *rres* object which can be a python list/dictionary of very nested elements. Try printing by indexes: `rres[[0]][[0]][[0]]` but some items may need dict keys! Link shows output of nested R list containing vectors, matrices, even arrays. — Parfait, Jul 02 '17 at 20:09
there should be a way to access this attributes through rpy2 the same way as with R directly: `c(tt$parameters$df)`, without need to guess the python list/directory structure. — vare, Jul 03 '17 at 09:23

score 0 · Accepted Answer · answered Jul 03 '17 at 18:45

Consider using the x.names.index('myname') to reference nested named elements in R objects. See rpy2 docs. And as a reminder and demonstrated below you can still reference both R and Python nested objects with number indexing.

To reproduce your R object with exact random data we need to run the set.seed on R side as there is no easy way to find the equivalent random number generator across languages. See related post. Finally, base R's as.vector() is used to cast array objects to vectors. All returns in Python are R FloatVectors: <class 'rpy2.robjects.vectors.FloatVector'>.

Python

from rpy2.robjects.packages import importr

base = importr('base')
stats = importr('stats')
teigen = importr('teigen')

base.set_seed(101)
data_sample = base.as_numeric([(5+3*i) for i in stats.rt(1000,df=5)] + \
                              [(10+1*i) for i in stats.rt(10000,df=20)])

num_components = 2

rres = teigen.teigen(data_sample, Gs=num_components, scale=False, 
                     dfupdate="numeric", models="univUU")

# BY NUMBER INDEX
df = rres[2][0]
mean = base.as_vector(rres[2][1])
scale = base.as_vector(rres[2][3])

print(df)
# [1]  3.578491 47.059841
print(mean)
# [1]  4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588


# BY NAME INDEX 
# (i.e., find corresponding number to name in R object)
params = rres[rres.names.index('parameters')]

df = params[params.names.index('df')]
mean = base.as_vector(params[params.names.index('mean')])
scale = base.as_vector(params[params.names.index('sigma')])

print(df)
# [1]  3.578491 47.059841
print(mean)
# [1]  4.939179 10.002038
print(scale)
# [1] 8.763076 1.041588

R (equivalent script)

library(teigen)

set.seed(101)
data_sample <- c(5+ 3*rt(1000,df=5),
                 10+1*rt(10000,df=20))
num_components <- 2

tt <- teigen(data_sample, Gs=num_components, scale=FALSE, 
             dfupdate="numeric", models="univUU")    

# BY NUMBER INDEX
df = tt[[3]][[1]]
mean = as.vector(tt[[3]][[2]])
scale = as.vector(tt[[3]][[4]])

print(df)
# [1]  3.578491 47.059841     
print(mean)
# [1]  4.939179 10.002038     
print(scale)
# [1] 8.763076 1.041588

# BY NAME INDEX
df = c(tt$parameters$df)
mean = c(tt$parameters$mean)
scale = c(tt$parameters$sigma)

print(df)
# [1]  3.578491 47.059841    
print(mean)
# [1]  4.939179 10.002038    
print(scale)
# [1] 8.763076 1.041588

I just started to compose my own answer... :P I basically managed to access the attributes in a similar (equivalent) way: `rres.rx2('parameters').rx2('df')` for e.g the df attribute. — vare, Jul 03 '17 at 18:50

dynamically calling R library from python using rpy2

1 Answers1