0

I am attempting to use the R "NADA" package using the rpy2 interface in Python. The end goal is to perform survival analysis on left-censored environmental data. Things seem to be interacting correctly between Python and R for other functions, and I am able to perform a test function in R, but I get an error when attempting the same through rpy2.

This is my code in Python. It is entirely fictitious data.

from rpy2.robjects import FloatVector, BoolVector, FactorVector
from rpy2.robjects.packages import importr

nada = importr('NADA')
obs = FloatVector([1.0,2.0,3.0,5.0,56.0,1.0,4.0])
nds = BoolVector([False, True, True, True, True, False, True])
groups = FactorVector([1,0,1,0,1,1,0])

nada.cendiff(obs, nds, groups)

This is the error message I receive:

Traceback (most recent call last):
  File "C:/Users/XXXXXXX/rpy2_test.py", line 9, in <module>
    nada.cendiff(obs, nds, groups)
  File "C:\Program Files\Python35\lib\site-packages\rpy2\robjects\functions.py", line 178, in __call__
    return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
  File "C:\Program Files\Python35\lib\site-packages\rpy2\robjects\functions.py", line 106, in __call__
    res = super(Function, self).__call__(*new_args, **new_kwargs)
rpy2.rinterface.RRuntimeError: Error in terms.formula(tmp, simplify = TRUE) : 
  invalid model formula in ExtractVars

This code works fine in the R terminal:

library("NADA")
cendiff(c(1.0,2.0,3.0,5.0,56.0,1.0,4.0), c(FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE), factor(c(1,0,1,0,1,1,0)))

I tried adding some print lines at the rpy2 error lines listed, and suspect there may be an issue with rpy2 removing the levels from the factor vector when sending them to the function. However, I'm in new territory and that may just be a red herring.

If anyone can lend some insight or offer an alternative, I would appreciate it. I have a lot of data processing coded in Python and going all R isn't a good option, but R has more analysis options so I was hoping rpy2 would do the trick.

Nate Wanner
  • 199
  • 2
  • 10
  • @Parfait Thanks for the note! I had changed the lengths in multiple tests to see if that could be an issue, but it did not change anything. I must have copied and pasted different versions here. – Nate Wanner Jan 29 '17 at 22:31
  • It seems like [cendiff](https://www.rdocumentation.org/packages/NADA/versions/1.5-6/topics/cendiff) are *front ends* to routines in the `survival` package. So in R you my be accessing its dependency which is not loaded in Python environment. – Parfait Jan 29 '17 at 23:13
  • A simple import may not work as the two loaded in Python would be separate global objects (i.e., source code does not know the named object you give it). You may need to recreate `cendiff` call from original `survival`, digging through the package's code! – Parfait Jan 29 '17 at 23:16
  • You are correct, cendiff is essentially a front end to survdiff in the survival package. It reverses the censoring from left-censored to right-censored so the statistical formulas can be applied and takes care of some of the other variables. I've tried going through some of the source code and have identified parts of it (such as the flipping), but haven't located at how it handles grouping yet. – Nate Wanner Jan 30 '17 at 00:17
  • I tested the Kendall Tau functionality in the NADA package, and it worked fine using logical and number vectors (BoolVector and FloatVector in rpy2). I suspect the problem has something to do with how rpy2 passes FactorVectors into the R package. – Nate Wanner Jan 30 '17 at 00:21
  • I tried importing survdiff beside cendiff. As @Parfait expected, it did not change things. – Nate Wanner Jan 30 '17 at 00:22
  • This is not a `rpy2` issue but a package dependency issue as `cendiff` requires `survival` which the two cannot talk to each other in Python environment without adjusting source code. Is there a GitHub link for source code? Or did you go through R console? – Parfait Jan 30 '17 at 01:04
  • I downloaded package source from https://cran.r-project.org/web/packages/NADA/index.html – Nate Wanner Jan 30 '17 at 01:07
  • If it is a dependency issue, why would the Kendall Tau function work? Is it only a dependency issue when FactorVectors are involved? – Nate Wanner Jan 30 '17 at 01:08

1 Answers1

1

When in doubt about whether rpy2 and/or one of its conversion rules are doing something unexpected, it is relatively easy to check it. For example here:

from rpy2.robjects.vectors import FactorVector
from rpy2.robjects import r, globalenv

# factor with rpy2
groups = FactorVector([1,0,1,0,1,1,0])
# bind it to symbol in R's GlobalEnv
globalenv['groups_rpy2'] = groups

# it is the same as building the factor in R ?
r("""
    ...: groups <- factor(c(1,0,1,0,1,1,0))
    ...: print(identical(groups, groups_rpy2))
    ...: """)
[1]
 TRUE

# apparently so

I am suspecting that this is caused by the fact that (unevaluated) expression statements are used in the R library you are using, and rpy2 is passing anonymous R objects. I had a quick glance at that code and I can see:

setMethod("cendiff", 
          signature(obs="numeric", censored="logical", groups="factor"), 
          cencen.vectors.groups)

and

cencen.vectors.groups =
function(obs, censored, groups, ...)
{
    cl = match.call()
    f = substitute(Cen(a, b)~g, list(a=cl[[2]], b=cl[[3]], g=cl[[4]]))
    f = as.formula(f)
    environment(f) = parent.frame()
    callGeneric(f, ...)
}

One way to work around that is to bind your objects to symbols in an R namespace/environment and evaluate the call in that namespace. It could be done with any R environment but if using "GlobalEnv" (in that case remember that the content of GlobalEnv persists until the embedded R is closed):

from rpy2.robjects.packages import importr
base = importr('base')
# bind to R symbols
globalenv["obs"] = obs
globalenv["nds"] = nds
globalenv["groups"] = groups

# make the call 
nada.cendiff(base.as_symbol('obs'),
             base.as_symbol('nds'),
             base.as_symbol('groups'))

(See an other use of as_symbol in Minimal example of rpy2 regression using pandas data frame)

Community
  • 1
  • 1
lgautier
  • 11,363
  • 29
  • 42
  • Thank you very much! It worked great with some quick cut and paste. I'll work on cleaning up my code and understanding this better tomorrow. I also appreciate the pandas link. Prior to your post, I've been trying to workaround the issue by using a pandas data frame and iPython console with rmagic, but wasn't convinced that would work once I started looping through generated SQL queries. I believe the globalenv at your link fills in a missing piece without the iPython complication. – Nate Wanner Feb 01 '17 at 03:40