1

I want to create a basic function that would help me do this systematically:

Here's the data:

set.seed(1)
a <- as.numeric(c(-5:30))
b <- runif(30,min=0,max=1)

data <- as.data.frame(cbind(a,b))

And here's what I do

data$adummy <- 0
data$adummy[data$a>0] <-1
obsa <- sum(data$adummy[data$adummy>0]) #Number of positive observations
areceiptshare <- (sum(data$adummy[data$adummy>0]*data$b[data$adummy>0])/sum(data$b))*100 #Weighted share of positive observations
areceiptshare

When I try to do a generic function:

wmean <- function (df,x,w) {

  df$adummy <- 0
  df$adummy[df$x>0] <-1
  obsa <- sum(df$adummy[df$adummy>0]) #Number of observations
  areceiptshare <- (sum(df$adummy[df$adummy>0]*df$w[df$adummy>0])/sum(df$w))*100

}

And plug the data in the function

result <- wmean (df = data, x = a, w = b)

It yields NaN instead of the correct value (in this case 82.6063). What am I doing wrong? Why can't the function call the columns within the data frame? Thanks!

Juan C
  • 301
  • 1
  • 11
  • 1
    The function should rather be called wshare than wmean, by the way... not a mean of anything, but a weighted (by variable "b" ("w", in the function)) share of the number positive values of "a" (or "x", in the function). – Juan C Oct 10 '19 at 16:01

1 Answers1

1

We can use [[ instead of $ and pass the column names as strings

wmean <- function (df,x,w) {

   df[["adummy"]] <- 0
  df[["adummy"]][df[[x]]>0] <-1
  obsa <- sum(df[["adummy"]][df[["adummy"]]>0]) #Number of observations
  areceiptshare <- (sum(df[["adummy"]][df[["adummy"]]>0]*
           df[[w]][df[["adummy"]]>0])/sum(df[[w]]))*100

 areceiptshare
   }

 wmean (df = data, x = "a", w = "b")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks so much @akrun, I knew about the enclosing with the brackets and had tried it (there's indeed questions about that) but I was not using the "" quoting properly, in particular for the new variable "adummy" created within the formula! – Juan C Oct 10 '19 at 15:53