I am using R Data Analysis Examples: Ordinal Logistic Regression as a guide to do an ordinal logistic regression (ultimately in python using the rpy2 interface).
In the steps where they test the proportional odds assumption they create a table of the predicted estimates using the formula:
(s <- with(dat, summary(as.numeric(apply) ~ pared + public + gpa, fun=sf)))
One thing I noticed is that the behavior of the fun =
argument was different if fun
was upper cased. To see why, I looked at the source here: summary.R source, but only FUN =
was found.
According to the UCLA site (in link above): "When R sees a call to summary with a formula argument, it will calculate descriptive statistics for the variable on the left side of the formula by groups on the right side of the formula and will return the results in a nice table. By default, summary will calculate the mean of the left side variable... However, we can override calculation of the mean by supplying our own function, namely sf to the fun=
argument. The final command asks R to return the contents to the object s, which is a table."
I understand what this is doing, but I am not sure where the argument fun =
is in terms of the source code (FUN
seems to be the default, giving the left side of the formula and disregarding the function sf). Where is this override located? And is this actually documented some place? If so, where, since it is not obviously in the help documentation. This is the first time I've looked at the R source, so I will freely admit that I am clueless.
The reason why I am digging into this is that the behavior in rpy2 is not consistent with that in R. In R, both fun =
and FUN =
produce output, but in rpy2, only FUN =
produces output; fun =
throws an error that RRuntimeError: Error in as.character(substitute(fun)) :
cannot coerce type 'closure' to vector of type 'character'
Thus, the need to dig into the source to figure why this is not working as expected.
EDIT
The python lines that succeed and fail, are respectively (I created a package in R called gms.test, which contains the function/closure sf):
from rpy2.robjects import pandas2ri
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
pandas2ri.activate()
gms = importr("gms.test")
hmisc = importr('Hmisc')
base = importr('base', robject_translations={'with': '_with'})
stats = importr('stats', robject_translations={'format_perc': '_format_perc'})
r_consult_case_control = pandas2ri.py2ri(consult_case_control)
formula = stats.as_formula('es_score ~ n + raingarden + consult_case')
formula.getenvironment()['es_score'] = r_consult_case_control.rx2('es_score')
formula.getenvironment()['n'] = r_consult_case_control.rx2('n')
formula.getenvironment()['raingarden'] = r_consult_case_control.rx2('raingarden')
formula.getenvironment()['consult_case'] = r_consult_case_control.rx2('consult_case')
# succeeds:
base._with(r_consult_case_control, ro.r.summary(formula, FUN=gms.sf))
# fails with given error:
base._with(consult_case_control, ro.r.summary(formula, fun=gms.sf))
Please note that debugging this code is not what I intended in this question. I just wanted to be able to see what the fun
override in R was doing.