0

I want to apply some statistical computations that comprise reliability measurements such as ICC or coefficient of variation. While I can compute them individually, I am not yet familiar with R functional programming practices to straight perform multiple computations without too much code repetition.

Consider the following data.frame example comprising repeated measures (T1, T2) on five different variables (Var1, ... Var5) :

set.seed(123)
df = data.frame(matrix(rnorm(100), nrow=10))
names(df) <- c("T1.Var1", "T1.Var2", "T1.Var3", "T1.Var4", "T1.Var5",
               "T2.Var1", "T2.Var2", "T2.Var3", "T2.Var4", "T2.Var5")

If I want to calculate an intraclass correlation coefficient between both repeated measures of each variable, I could: 1) Create function that returns: ICC, lower and upper bounds values:

calcula_ICC <- function(a, b) {
  ICc <- ICC(matrix(c(a,b), ncol = 2))
  icc <- ICc$results[[2]] [3]
  lo  <- ICc$results[[7]] [3]
  up  <- ICc$results[[8]] [3]
  round(c(icc, lo, up),2)
} 

and 2) apply it to each corresponding variable as follows:

calcula_ICC(df$T1.Var1, df$T2.Var1)
calcula_ICC(df$T1.Var2, df$T2.Var2)
calcula_ICC(df$T1.Var3, df$T2.Var3)
calcula_ICC(df$T1.Var4, df$T2.Var4)
calcula_ICC(df$T1.Var5, df$T2.Var5)

I would then procede similarly with other statistical computations on each variable such as coefficient of variation or standard error between repeated measurements.

However, how could use some of the functional programming principles? How could I create, for instance, a function that take each corresponding variable on T1 and T2 as well as the desired function as arguments?

AJMA
  • 1,134
  • 2
  • 13
  • 28
  • Have a look at [broom](https://cran.r-project.org/web/packages/broom/vignettes/broom.html) – Steven Beaupré Aug 11 '17 at 19:41
  • 1
    This problem would be *much* easier to solve if you get the data into a tidy format: https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns – Nathan Werth Aug 11 '17 at 19:46

2 Answers2

1

The functional programming approach is to use mapply. No "tidying" required:

result = mapply(calcula_ICC, df[, 1:5], df[, 6:10], USE.NAMES=FALSE)

colnames(result) = paste0('Var', 1:5)

# Better than setting rownames here is to have calcula_ICC() return a named vector
rownames(result) = c('icc','lo','up')

> result
#      Var1  Var2  Var3  Var4  Var5
# icc  0.09  0.08 -0.37 -0.23 -0.17
# lo  -0.54 -0.55 -0.80 -0.73 -0.70
# up   0.66  0.65  0.29  0.43  0.48

(Note that the result is a matrix.)

sirallen
  • 1,947
  • 14
  • 21
0

There are going to be a lot of approaches to this and I don't have time to post them all, but I may come back to add an lapply solution as well since the apply functions are very important in R.

Using dplyr and tidyr

Here is a dplyr and tidyr solution that may help:

require(dplyr)
require(tidyr)

# let's have a function for each value you want eventually
GetICC <- function(x, y) {
  require(psych)
  ICC(matrix(c(x, y), ncol = 2))$results[[2]][3]
}

GetICCLo <- function(x, y) {
  require(psych)
  ICC(matrix(c(x, y), ncol = 2))$results[[7]][3]
}

    GetICCUp <- function(x, y) {
      require(psych)
  ICC(matrix(c(x, y), ncol = 2))$results[[8]][3]
}

# tidy up your data, take a look at what this looks like
mydata <- df %>%
  mutate(id = row_number()) %>%
  gather(key = time, value = value, -id) %>%
  separate(time, c("Time", "Var")) %>%
  spread(key = Time, value = value)

# group by variable, then run your functions
# notice I added mean difference between the two
# times as an example of how you can extend this
# to include whatever summaries you need
myresults <- mydata %>%
  group_by(Var) %>%
  summarize(icc = GetICC(T1, T2),
            icc_lo = GetICCLo(T1, T2),
            icc_up = GetICCUp(T1, T2),
            mean_diff = mean(T2) - mean(T1))

This works well as long as everything you are passing to the summarize is going to aggregate/calculate at the same level.

Geoffrey Grimm
  • 281
  • 2
  • 6