Why is a function giving me a different answer from the code without the function?

Question

So this is the code chunk I am trying to put in a custom function

treated.median <- with(subset(dta, catholic == 1), Hmisc::wtd.quantile(score, probs = .5)) 
counterfactual <- with(subset(dta, catholic == 0), Hmisc::wtd.quantile(score, ipw_tot, probs = .5))
QTET <- treated.median - counterfactual
QTET

the output this gives me when running it is

50% 
-1.083

I tried to make it a function like this

ZIB <- function(data, k, s, t) {
  treated.median <- with(data[k == 1,], Hmisc::wtd.quantile(s, probs = .5)) 
  counterfactual <- with(data[k == 0,], Hmisc::wtd.quantile(s, t, probs = .5))
  QTET <- treated.median - counterfactual
  return(QTET)
}

ZIB(dta, 
     dta$catholic, dta$score, dta$ipw_tot)

the output I get is this

  50% 
-2.397

What am I missing here? Why am I getting two different answers? (I have a feeling that it might be a very silly thing that I am missing).

In a nutshell, because the functions you’re calling (`subset`, `with` …) use [non-standard evaluation](https://adv-r.hadley.nz/evaluation.html?q=non-standard%20evaluation). You can’t easily use variables here. If you read the documentation of `subset` it warns you not to use the function when programming. Use regular subsetting via `[` instead. — Konrad Rudolph, Nov 17 '20 at 09:16
So I did this and I still got the same problem `ZIB <- function(data, k, s, t) { treated.median <- with(data[k == 1,], Hmisc::wtd.quantile(s, probs = .5)) counterfactual <- with(data[k == 0,], Hmisc::wtd.quantile(s, t, probs = .5)) QTET <- treated.median - counterfactual return(QTET) } ZIB(dta, dta$catholic, dta$score, dta$ipw_tot)` — yacoub q, Nov 17 '20 at 09:48
You’ve correctly replaced the `subset` function but you’re still using the `with` function, which has the same issue. — Konrad Rudolph, Nov 17 '20 at 10:07
Where does the `for` loop come from? It’s not in your original code, and it does something quite different. Anyway, check out my answer. — Konrad Rudolph, Nov 17 '20 at 10:41

score 0 · Answer 1 · answered Nov 17 '20 at 10:40

The issue is that functions such as subset and with use non-standard evaluation. They are useful for interactive exploration but complicated to use when programming.

You also need to think about how you want to pass column identifiers to your function. At the moment you pass ordinary vectors, not columns. This can be appropriate, but you likely don’t want users of your function to pass just any vector, you really want them to use columns of your data.

There are multiple ways of doing this — including using non-standard evaluation — but in base R, the easiest way is to pass the column names as strings. Then you can access the column inside your function using [[ instead of $:

table$column

is equivalent to:

table[['column']]

ZIB <- function(data, k, s, t) {
  treated_median <- Hmisc::wtd.quantile(data[data[[k]] == 1, s], probs = 0.5)) 
  counterfactual <- Hmisc::wtd.quantile(data[data[[k]] == 0, s], t, probs = 0.5))
  treated_median - counterfactual
}

(I’ve removed the unnecessary return function call.)

You should also consider using more descriptive parameter names instead of k, s and t.

I would be tempted to rewrite the code slightly to (a) avoid a redundancy, and (b) make its meaning more explicit:

ZIB <- function(data, k, s, t) {
  treated = data[[k]] == 1
  treated_median <- Hmisc::wtd.quantile(data[treated, s], probs = 0.5)) 
  counterfactual <- Hmisc::wtd.quantile(data[! treated, s], t, probs = 0.5))
  treated_median - counterfactual
}

Why is a function giving me a different answer from the code without the function?

1 Answers1