0

I'm curious how I could convert a for loop that I've written into a function in R? I've no experience with writing my own functions in R. I looked here and here but these did not seem to offer much help. I am aware that for loops are not necessary and overall I'm trying to do something similar to this blog post.

The for loop with reproducible data is here:

P <- c(1:50)
y <- length(P)
D <- as.data.frame(combs(P,2))
Z <- choose(y,2)
Num = NULL
Denom = NULL
Diff = NULL

for(n in 1:Z)
   {
    Num[n] = abs(D$V1[n]-D$V2[n])
    Denom[n] = max(D$V1[n], D$V2[n])
    Diff[n] = Num[n]/Denom[n]
    }
 PV=mean(Diff)
 PV

But, I'm interested in calculating PV based on levels such as in this data:

DATA <- c(1:500)
NAME <- c("a", "b", "c", "d", "e")
mydf <- as.data.frame(cbind(DATA, NAME))

Therefore, my final code I would like to use would be:

ANSWER <- tapply(mydf$DATA, mydf$NAME, MY.FUNCTION) 

So, if I could turn the above for loop into a working function I could run the tapply function to get PV based on levels.

Any help would be appreciated or any other suggestions opposed to the one I offer.

Thanks!

Community
  • 1
  • 1
Corey C.
  • 11
  • 3

1 Answers1

1

Once you have your library loaded:

library(caTools)

Here's a function you can run on your data:

mymeandiff <- function(values){
    df <- as.data.frame(combs(values, 2))
    diff <- abs(df$V1 - df$V2)/pmax(df$V1, df$V2)
    mean(diff)
}
mymeandiff(1:50)

Then we can use dplyr to run on each group (after correcting the data):

mydf$DATA <-as.numeric(as.character(mydf$DATA))

library(dplyr)
mydf %>% group_by(NAME) %>%
         summarise(mymeandiff(DATA))

For apply, rather than dplyr:

tapply(mydf$DATA, mydf$NAME, FUN = mymeandiff)

Let's time it:

microbenchmark::microbenchmark(tapply = tapply(mydf$DATA, mydf$NAME, FUN=mymeandiff),
                               dplyr = mydf %>% group_by(NAME) %>%
                                                summarise(mymeandiff(DATA)))
Unit: milliseconds
   expr      min       lq     mean   median       uq       max neval
 tapply 60.36543 61.08658 63.81995 62.61182 66.13671  80.37819   100
  dplyr 61.84766 62.53751 67.33161 63.61270 67.58688 287.78364   100

tapply is slightly faster

jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • Both of those work and provide the same result as my for loop. But I'm interested in how I would calculate PV for various levels. Such as in this data DATA <- c(1:500) NAME <- c("a", "b", "c", "d", "e") mydf <- as.data.frame(cbind(DATA, NAME)). – Corey C. Aug 06 '15 at 16:12
  • Thanks! Precisely what I was looking for. Appreciate the help! – Corey C. Aug 06 '15 at 16:33