2

I'm attempting to replicate an example from the MNP package in R, from rpy2 in two different ways. In the first, I'm just using robjects.r with a string that exactly copies and pastes the R code:

import rpy2.robjects as robjects
import rpy2.robjects.numpy2ri
import rpy2.robjects.pandas2ri
import rpy2.robjects.packages as rpackages

robjects.pandas2ri.activate()
mnp = rpackages.importr('MNP')
base = rpackages.importr('base')

r = robjects.r
r.data('detergent')
rcmd = '''\
mnp(choice ~ 1, choiceX = list(Surf=SurfPrice, Tide=TidePrice,
Wisk=WiskPrice, EraPlus=EraPlusPrice,
Solo=SoloPrice, All=AllPrice),
cXnames = "price", data = detergent, n.draws = 500, burnin = 100,
thin = 3, verbose = TRUE)'''

res = r(rcmd)

This works fine and reproduces what I can do directly in R. I also wanted to try running this code using python accessible objects, passing in data from a dataframe:

import rpy2.rlike.container as rlc
df = robjects.pandas2ri.ri2py(r['detergent'])

choiceX = rlc.TaggedList(['SurfPrice', 'TidePrice', 'WiskPrice', 'EraPlusPrice', 'SoloPrice', 'AllPrice'], 
                         tags=('Surf', 'Tide', 'Wisk', 'EraPlus', 'Solo', 'All'))

res = mnp.mnp('choice ~ 1', 
              choiceX=['SurfPrice', 'TidePrice', 'WiskPrice', 'EraPlusPrice', 'SoloPrice', 'AllPrice'],
              cXnames='price', 
              data=df, n_draws=500, burnin=100,
              thin=3, verbose=True)

This fails with the error:

Error in xmatrix.mnp(formula, data = eval.parent(data), choiceX = call$choiceX,  : 
  Error: Invalid input for `choiceX.'
 You must specify the choice-specific varaibles at least for all non-base categories.

Substitution of the R named list with the rpy2 TaggedList was suggested in another SO response. If I remove the choiceX and cXnames arguments to MNP (they are optional), the code runs, so it looks like the pandas dataframe is being passed in correctly.

I'm not sure if the TaggedList isn't being properly interpreted as a named list once it gets into R, or if there is some issue with MNP not associating the contents of choiceX with the pandas dataframe.

Anyone have ideas of what might be going on here?

Update

Following @lgautier's suggestion, I modified my code to:

choiceX = rlc.TaggedList([base.as_symbol('SurfPrice'), base.as_symbol('TidePrice'), 
                          base.as_symbol('WiskPrice'), base.as_symbol('EraPlusPrice'), 
                          base.as_symbol('SoloPrice'), base.as_symbol('AllPrice')], 
                         tags=('Surf', 'Tide', 'Wisk', 'EraPlus', 'Solo', 'All'))

res = mnp.mnp(robjects.Formula('choice ~ 1'), 
              choiceX=choiceX,
              cXnames='price', 
              data=df, n_draws=500, burnin=100,
              thin=3, verbose=True)

However, I get an identical error as posted previously.

Update 2

Following the workaround suggested by @lgautier, the following code:

choiceX = rlc.TaggedList([base.as_symbol('SurfPrice'),
                          base.as_symbol('TidePrice'), 
                          base.as_symbol('WiskPrice'),
                          base.as_symbol('EraPlusPrice'), 
                          base.as_symbol('SoloPrice'),
                          base.as_symbol('AllPrice')], 
                         tags=('Surf', 'Tide', 'Wisk',
                               'EraPlus', 'Solo', 'All'))

choiceX = robjects.conversion.py2ro(choiceX)
# add the names
choiceX.names = robjects.vectors.StrVector(('Surf', 'Tide',
                                            'Wisk', 'EraPlus',
                                            'Solo', 'All'))

res = mnp.mnp(robjects.Formula('choice ~ 1'), 
              choiceX=choiceX,
              cXnames='price', 
              data=df, n_draws=500, burnin=100,
              thin=3, verbose=True)

Still produces an error (albeit a different one):

Error in as.vector(x, mode) : 
  cannot coerce type 'symbol' to vector of type 'any'
---------------------------------------------------------------------------
RRuntimeError                             Traceback (most recent call last)
<ipython-input-21-7de5ad805801> in <module>()
      3               cXnames='price',
      4               data=df, n_draws=500, burnin=100,
----> 5               thin=3, verbose=True)

/Users/lev/anaconda/envs/rmnptest/lib/python2.7/site-packages/rpy2-2.5.6-py2.7-macosx-10.5-x86_64.egg/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
    168                 v = kwargs.pop(k)
    169                 kwargs[r_k] = v
--> 170         return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
    171 
    172 pattern_link = re.compile(r'\\link\{(.+?)\}')

/Users/lev/anaconda/envs/rmnptest/lib/python2.7/site-packages/rpy2-2.5.6-py2.7-macosx-10.5-x86_64.egg/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
     98         for k, v in kwargs.items():
     99             new_kwargs[k] = conversion.py2ri(v)
--> 100         res = super(Function, self).__call__(*new_args, **new_kwargs)
    101         res = conversion.ri2ro(res)
    102         return res

RRuntimeError: Error in as.vector(x, mode) : 
  cannot coerce type 'symbol' to vector of type 'any'
Community
  • 1
  • 1
JoshAdel
  • 66,734
  • 27
  • 141
  • 140

1 Answers1

1

The Python code does not correspond to your R. You figured this out since you are posting, so have details below. The summary is that R symbols and Python strings are not equivalent (although R is confusing its own users by allowing both in some places - e.g., both library("MNP") and library(MNP) will work).

This is not unlike this question: pandas and rpy2: Why does ezANOVA work via robjects.r but not robjects.packages.importr?

...except that choiceX will be an unevaluated R expression rather than just a symbol.

The R code is:

data(detergent)
mnp(choice ~ 1,
    # ^- this is a "formula", which is an expression in R
    choiceX = list(Surf=SurfPrice, Tide=TidePrice,
                   Wisk=WiskPrice, EraPlus=EraPlusPrice,
                   Solo=SoloPrice, All=AllPrice),
    # ^- this is a list of objects, but with the cautionary note
    #    that R evaluates expressions in argument lazily. Therefore
    #    the safest is to have it as an R expression (it may or may
    #    not work if evaluated, but this depends on the code in
    #    `mnp`)
    cXnames = "price",
    # ^- this is a string
    data = detergent,
    n.draws = 500, burnin = 100,
    thin = 3, verbose = TRUE)

The Python you have is (with comments about the differences) :

choiceX = rlc.TaggedList(['SurfPrice', 'TidePrice', 'WiskPrice',
                          'EraPlusPrice', 'SoloPrice', 'AllPrice'], 
                         tags=('Surf', 'Tide', 'Wisk',
                               'EraPlus', 'Solo', 'All'))
# ^- this is a "tagged list", and the R equivalent would be
#    list(Surf="SurfPrice", Tide="TidePrice", Wisk="WiskPrice",
#         EraPlus="EraPlusPrice", Solo="SoloPrice", All="AllPrice")
#    Something closer to your R code above would be:
#    rlc.TaggedList([as_symbol('SurfPrice'), as_symbol('TidePrice'),
#                   ...
#                   tags=('Surf', 'Tide', ...))

res = mnp.mnp('choice ~ 1', 
              # ^- this is a string. To make it an R formula, do
              # robjects.Formula('choice ~ 1')
              choiceX=['SurfPrice', 'TidePrice', 'WiskPrice',
                       'EraPlusPrice', 'SoloPrice', 'AllPrice'],
              # ^- this should be choiceX defined above, I guess
              cXnames='price',
              # ^- this is a string, like in R 
              data=df,
              n_draws=500, burnin=100,
              thin=3, verbose=True)

Edit:

Now this means that the following should work

choiceX = robjects.rinterface.parse("""
    list(Surf=SurfPrice, Tide=TidePrice,
         Wisk=WiskPrice, EraPlus=EraPlusPrice,
         Solo=SoloPrice, All=AllPrice)""")

Currently rpy2 is not offering many utility for the construction of R expressions. If the variable names are parameters at the Python level you can consider something like:

rcode = 'list('+''.join('%s=%s' % (k,v) \
                        for k,v in \
                        (('Surf','SurfPrice'),
                         ('Tide', 'TidePrice'),
                         ('Wisk','WiskPrice'),
                         ('EraPlus','EraPlusPrice'),
                         ('Solo','SoloPrice'),
                         ('All','AllPrice'))) + ')'
choiceX = robjects.rinterface.parse(rcode)
Community
  • 1
  • 1
lgautier
  • 11,363
  • 29
  • 42
  • Thanks for the suggestion. As indicated in the Update above, I believe I replicated what you were suggesting, but get an identical error. Am I still missing something? – JoshAdel Jul 03 '15 at 03:14
  • Thanks for looking into this. The first workaround still is giving an error. See **Update 2** in my original post. – JoshAdel Jul 06 '15 at 02:47
  • @JoshAdel . Ah, yes... choiceX should be an unevaluated R expression. The first workaround will not work. – lgautier Jul 08 '15 at 13:42