2

I am using R Data Analysis Examples: Ordinal Logistic Regression as a guide to do an ordinal logistic regression (ultimately in python using the rpy2 interface).

In the steps where they test the proportional odds assumption they create a table of the predicted estimates using the formula:

(s <- with(dat, summary(as.numeric(apply) ~ pared + public + gpa, fun=sf)))

One thing I noticed is that the behavior of the fun = argument was different if fun was upper cased. To see why, I looked at the source here: summary.R source, but only FUN = was found.

According to the UCLA site (in link above): "When R sees a call to summary with a formula argument, it will calculate descriptive statistics for the variable on the left side of the formula by groups on the right side of the formula and will return the results in a nice table. By default, summary will calculate the mean of the left side variable... However, we can override calculation of the mean by supplying our own function, namely sf to the fun= argument. The final command asks R to return the contents to the object s, which is a table."

I understand what this is doing, but I am not sure where the argument fun = is in terms of the source code (FUN seems to be the default, giving the left side of the formula and disregarding the function sf). Where is this override located? And is this actually documented some place? If so, where, since it is not obviously in the help documentation. This is the first time I've looked at the R source, so I will freely admit that I am clueless.

The reason why I am digging into this is that the behavior in rpy2 is not consistent with that in R. In R, both fun = and FUN = produce output, but in rpy2, only FUN = produces output; fun = throws an error that RRuntimeError: Error in as.character(substitute(fun)) : cannot coerce type 'closure' to vector of type 'character'

Thus, the need to dig into the source to figure why this is not working as expected.

EDIT

The python lines that succeed and fail, are respectively (I created a package in R called gms.test, which contains the function/closure sf):

from rpy2.robjects import pandas2ri
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
pandas2ri.activate()
gms = importr("gms.test")
hmisc = importr('Hmisc')
base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})

r_consult_case_control = pandas2ri.py2ri(consult_case_control)
formula = stats.as_formula('es_score ~ n + raingarden + consult_case')

formula.getenvironment()['es_score'] = r_consult_case_control.rx2('es_score')
formula.getenvironment()['n'] = r_consult_case_control.rx2('n')
formula.getenvironment()['raingarden'] = r_consult_case_control.rx2('raingarden')
formula.getenvironment()['consult_case'] = r_consult_case_control.rx2('consult_case')

# succeeds:
base._with(r_consult_case_control, ro.r.summary(formula, FUN=gms.sf))

# fails with given error:
base._with(consult_case_control, ro.r.summary(formula, fun=gms.sf))

Please note that debugging this code is not what I intended in this question. I just wanted to be able to see what the fun override in R was doing.

horcle_buzz
  • 2,101
  • 3
  • 30
  • 59
  • Please post both the *FUN* and *fun* Python lines that succeeds and fails. – Parfait Feb 12 '17 at 04:58
  • Done in edit above, although that was not the intent of this question (which is why I did not include it). I wanted to dive into the R code for the `fun` override to the `summary` method. I understand what is happening, in that R is trying to coerce the closure `sf` into a character string. Why/where it is trying to do so is what I would like to find out. – horcle_buzz Feb 12 '17 at 16:11
  • Hmmmm...since I do not have access to your package. I used `'mean'` for both *FUN* and *fun* with no error using `ro.r.summary()` and `base.summary()`. So can't reproduce. It must be something about `gm.sf`. Possibly it uses an R or Python naming conflict. – Parfait Feb 12 '17 at 16:48
  • You used `mean`, in quotes, right? As in `'mean?'` If so, that is where I believe the problem is. The function `sf` is based on the same one given on the UCLA link above, and to the best of my lack of intellectual powers, I cannot figure out how I would pass an R closure as a string (as expected, I get that `sf` is a closure in both R and in python). Regardless of which version of `sf` I use, theirs or mine, it has the same behavior. AND, I believe it is due to the fact that the function is NOT in quotes. If I do a `print(gms.sf)` in python, it does return the correct closure. – horcle_buzz Feb 12 '17 at 17:06
  • I believe, that in R, both `mean` and `'mean'` are one in the same, which of course confounds everything when trying to port code over to python. – horcle_buzz Feb 12 '17 at 17:32
  • Indeed, mean can be standalone or quoted in R. See this [SO post](http://stackoverflow.com/q/15419740/1422451). Your last question indicates you used the accepted answer which failed for me, but using the [next answer](http://stackoverflow.com/a/15419900/1422451), worked for both *FUN* and *fun*. My *sf* is very simply though with: `sf <- function(y) { mean(y) }` as I do not have R packages for `qlogis`. – Parfait Feb 12 '17 at 21:35
  • Thanks for this. I'll take a look. And any sf will do, really! ^_^ – horcle_buzz Feb 13 '17 at 02:23
  • i would continue this in chat, but I think this is something that is of benefit to others. Unless, you think just posting the final resolution as an answer would be appropriate? This is becoming quite long... – horcle_buzz Feb 13 '17 at 03:16
  • I'm able to create and access the function both ways in the SO link, and also using the function imported in the package I created in R. One thing I did NOT do was to add parentheses, ala base._with(consult_case_control, ro.r.summary(formula, fun=gms.sf())) ... with all 3 methods, I get the same error: RRuntimeError: Error in mean(y >= 0) : argument "y" is missing, with no default. This leads to a really stupid question: how do I pass variables from the formula to the custom `fun` if I have parentheses? Oy! Talk about finicky – horcle_buzz Feb 13 '17 at 03:25
  • I'm running out of ideas. Which version of R do you have installed? rpy2? I cannot for the life of me get `fun` to work with the summary function (even your simple version of `sf` with the `mean`). `sf` though DOES work with the R `aggregate` function within python using the `FUN` argument, so at least I know there is nothing wrong with that. If I use it in `summary` though it gives the closure coercion error. – horcle_buzz Feb 13 '17 at 21:42

1 Answers1

0

It's easy to get the source. As per how to view R source, I just ran getAnywhere(summary.formula) and got the relevant block of code:

if (length(fun)) {
    if (length(de <- deparse(fun)) == 2) {
        de <- as.list(fun)
        de <- as.character(de[[length(de)]])
        funlab <- if (de[1] == "apply") 
          de[length(de)]
        else de[1]
    }
    else funlab <- as.character(substitute(fun))
}

Thus, if the default fun is not used it tries converting to a character vector, which is why I keep getting the error wrt closure coercion. In R, I can recreate the error, by just doing as.character(sf) ... (FUN, btw was a red herring, since any argument that us not fun has the same behavior). Oy, this is indeed not fun!

Community
  • 1
  • 1
horcle_buzz
  • 2,101
  • 3
  • 30
  • 59
  • Well, I kind of have it working, although it is a half-arsed solution. I import the Pandas data frame into rmagic in iPython and then am able to get the summary result table using the magic command. I suppose this does make sense, since it can do the conversion to character in R just fine. – horcle_buzz Feb 14 '17 at 02:13
  • I am going to accept this as the answer, since, even if I could get this working directly in python, I lose the nice summary table formatting I get in R via the magic command (I exported the table object back into python and good lord, it is UGLY!). Time to move onward... – horcle_buzz Feb 14 '17 at 02:59