0

I'm trying to use rpy2 to run the multi.split function from the questionr package.

this is my code

from rpy2 import robjects
from rpy2.robjects.packages import importr

questionr = importr(str('questionr'))

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

after the last line I'm getting the following error:

RRuntimeError: Error in `colnames<-`(`*tmp*`, value = c("c(\"red/blue\",_\"green\",_\"red/green\",_\"blue/red\",_\"red/blue\",_\"green\",_.blue",  : 
 'names' attribute [4] must be the same length as the vector [3]

I think that it has something to do with the size of the vector that I'm sending because if I remove the last item

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue"]

and then run

data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

I get no error message. also if I change the "split_char' var, for example:

data_table = multi_split(data_vector, split_char='.')

I get no error message, no matter with size of an array I'm sending.

I have tried to run the matching code directly in R (with R-Studio) it runs with not problems. Any ideas on how can I solve this issue?

shlomiLan
  • 659
  • 1
  • 9
  • 33

1 Answers1

1

This seems to be because the function multi_split (multi.split in the R package) is trying to use the string representation of the expression associated with the first argument ("data_vector" here).

The signature of the R function is:

multi.split(var, split.char = "/", mnames = NULL)

and the he documentation for mnames is:

names to give to the produced variabels. If NULL, the name are computed from the original variable name and the answers.

In the call multi_split(data_vector, split_char='/') the embedded R cannot see the variable name as this is a Python call and data_vector is an anonymous variable (only content, no variable name).

I though that you could specify mnames, but you checked and this not working (see comments below). That's what the code seems to say: the line vname <- deparse(substitute(var)) is evaluated no matter mnames is specified or not: https://github.com/juba/questionr/blob/9cf09f3ffcd6c8df24182380f12d52b061c221ef/R/table.multi.R#L161

The alternative is to work out the use of an R expression. An older post should provide the necessary bits for that: What object to pass to R from rpy2?

A third possibility is to creatively mix Python-strings-as-R-code:

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
# binding the R vector to a symbol in R's "GlobalEnv"
robjects.globalenv['mydata'] = data_vector
# the call is now in a Python string that is evaluated as R code
data_table = robjects.r("multi.split(data_vector, split_char='/')")
Community
  • 1
  • 1
lgautier
  • 11,363
  • 29
  • 42
  • I have tried to add the `mnames` parameter as you can see here: `data_table = multi_split(data_vector, split_char='/', mnames=robjects.StrVector(['a', 'b']))` but I'm still getting the same error message. – shlomiLan Apr 13 '16 at 08:02
  • OK. I updated the answer. Hopefully one of the options is workable for you. Whenever the R code is using the unevaluated expression as a string to create labels or variables names, the use of anonymous objects can create trouble. – lgautier Apr 15 '16 at 01:26