0

I'm trying to define a Python function that involves rpy2 steps. This is my code :

from rpy2 import robjects
df=robjects.DataFrame.from_csvfile('mydataframe.csv')    

dplyr = importr('dplyr')
from rpy2.robjects.lib.dplyr import DataFrame

def boxplot(x):
        plot_df = (DataFrame(df).
                   filter('VAR1 == x' ))

        grdevices.png(file='boxplot.png')

        pp = ggplot2.ggplot(plot_df ) + \
             ggplot2.aes_string(x='VAR1', y='VAR2') + \
             ggplot2.geom_boxplot()
    
        pp.plot()
    
        grdevices.dev_off()

But when running boxplot(24) for example, I get this error : object 'x' not found.

How can I mix the two ? It seems according to the doc that Python syntax ** could be a solution, but it is not clear on how to use it.

Thanks

galactic
  • 17
  • 5
  • Please provide fuller context including all assigned objects and modules with `import`. It appears you are trying to run a Pandas data frame through R functions which need conversion to R data frame. And Pandas `DataFrame.filter` does not filter rows by logical condition. Finally, in general Python even R, variables need to be concatenated or formatted to strings to be recognized, hence `x` is literally just `x` and not the input parameter value. – Parfait Nov 28 '20 at 16:44
  • Sorry, I've added some context. In fact, my data frame is imported as an r object. – galactic Nov 29 '20 at 07:54

1 Answers1

0

The issue is here:

plot_df = (DataFrame(df).
           filter('VAR1 == x' ))

The string VAR1 == x will be evaluated as an R expression, but R does not know anything about your variable x defined in Python.

If x is a simple scalar you could try to just create a string that contains the value you want to filter on:

plot_df = (DataFrame(df).
           filter('VAR1 == %r' % x))

note: @Parfait's point about Dataframe is something I initially missed. It is assumed that this is a dplyr Dataframe, not a pandas DataFrame.

lgautier
  • 11,363
  • 29
  • 42
  • Thanks, it worked. Yes, my df is in fact an r object. I've edited my post with some context. – galactic Nov 29 '20 at 08:06
  • 1
    Consider avoiding the modulo operator `%` for string formatting which has been [de-empahsized in Python but not officially deprecated *yet*](https://stackoverflow.com/a/13452357/1422451). Instead, use `str.format` (Python 2.6+) or the newer F-string (Python 3.6+). – Parfait Nov 29 '20 at 16:56