Going from a for loop to a function in R

Question

I'm curious how I could convert a for loop that I've written into a function in R? I've no experience with writing my own functions in R. I looked here and here but these did not seem to offer much help. I am aware that for loops are not necessary and overall I'm trying to do something similar to this blog post.

The for loop with reproducible data is here:

P <- c(1:50)
y <- length(P)
D <- as.data.frame(combs(P,2))
Z <- choose(y,2)
Num = NULL
Denom = NULL
Diff = NULL

for(n in 1:Z)
   {
    Num[n] = abs(D$V1[n]-D$V2[n])
    Denom[n] = max(D$V1[n], D$V2[n])
    Diff[n] = Num[n]/Denom[n]
    }
 PV=mean(Diff)
 PV

But, I'm interested in calculating PV based on levels such as in this data:

DATA <- c(1:500)
NAME <- c("a", "b", "c", "d", "e")
mydf <- as.data.frame(cbind(DATA, NAME))

Therefore, my final code I would like to use would be:

ANSWER <- tapply(mydf$DATA, mydf$NAME, MY.FUNCTION)

So, if I could turn the above for loop into a working function I could run the tapply function to get PV based on levels.

Any help would be appreciated or any other suggestions opposed to the one I offer.

Thanks!

The expected outcome would be using the function in tapply. I would want the function to return PV — Corey C., Aug 06 '15 at 15:55

jeremycg · Accepted Answer · 2015-08-06T16:30:33.533

Once you have your library loaded:

library(caTools)

Here's a function you can run on your data:

mymeandiff <- function(values){
    df <- as.data.frame(combs(values, 2))
    diff <- abs(df$V1 - df$V2)/pmax(df$V1, df$V2)
    mean(diff)
}
mymeandiff(1:50)

Then we can use dplyr to run on each group (after correcting the data):

mydf$DATA <-as.numeric(as.character(mydf$DATA))

library(dplyr)
mydf %>% group_by(NAME) %>%
         summarise(mymeandiff(DATA))

For apply, rather than dplyr:

tapply(mydf$DATA, mydf$NAME, FUN = mymeandiff)

Let's time it:

microbenchmark::microbenchmark(tapply = tapply(mydf$DATA, mydf$NAME, FUN=mymeandiff),
                               dplyr = mydf %>% group_by(NAME) %>%
                                                summarise(mymeandiff(DATA)))
Unit: milliseconds
   expr      min       lq     mean   median       uq       max neval
 tapply 60.36543 61.08658 63.81995 62.61182 66.13671  80.37819   100
  dplyr 61.84766 62.53751 67.33161 63.61270 67.58688 287.78364   100

tapply is slightly faster

Both of those work and provide the same result as my for loop. But I'm interested in how I would calculate PV for various levels. Such as in this data DATA <- c(1:500) NAME <- c("a", "b", "c", "d", "e") mydf <- as.data.frame(cbind(DATA, NAME)). — Corey C., Aug 06 '15 at 16:12
Thanks! Precisely what I was looking for. Appreciate the help! — Corey C., Aug 06 '15 at 16:33

Going from a for loop to a function in R

1 Answers1