0

I've a database with 4 columns and 8 observations:

enter image description here

> df1
  Rater1 Rater2 Rater4 Rater5
1      3      3      3      3
2      3      3      2      3
3      3      3      2      2
4      0      0      1      0
5      0      0      0      0
6      0      0      0      0
7      0      0      1      0
8      0      0      0      0

I would like to have the mean, median, iqr, sd of all Rater1 and Rater4 observations (16) and all Rater2 and Rater5 observations (16) without creating a new df with 2 variables like this:

> df2
   var1 var2
1     3    3
2     3    3
3     3    3
4     0    0
5     0    0
6     0    0
7     0    0
8     0    0
9     3    3
10    2    3
11    2    2
12    1    0
13    0    0
14    0    0
15    1    0
16    0    0

I would like to obtain this (without a new database, just working on the first database):

> stat.desc(df2)
                   var1       var2
nbr.val      16.0000000 16.0000000
nbr.null      8.0000000 10.0000000
nbr.na        0.0000000  0.0000000
min           0.0000000  0.0000000
max           3.0000000  3.0000000
range         3.0000000  3.0000000
sum          18.0000000 17.0000000
median        0.5000000  0.0000000
mean          1.1250000  1.0625000
SE.mean       0.3275541  0.3590352
CI.mean.0.95  0.6981650  0.7652653
var           1.7166667  2.0625000
std.dev       1.3102163  1.4361407
coef.var      1.1646367  1.3516618

How can I do this in R?

Thank you in advance

Community
  • 1
  • 1
ArTu
  • 431
  • 4
  • 20
  • 1
    [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data (not a picture of it), all necessary code, and a clear explanation of what you've tried and the output you want. – camille Feb 21 '20 at 18:33

4 Answers4

1

We can loop over the column names that are similar, convert to a vector and get the mean, median, IQR and sd

out <- do.call(rbind, Map(function(x, y) {v1 <- c(df1[[x]], df1[[y]])
          data.frame(Mean = mean(v1), Median = median(v1),
           IQR = IQR(v1), SD = sd(v1))}, names(df1)[1:2], names(df1)[3:4]))



row.names(out) <- paste(names(df1)[1:2], names(df1)[3:4], sep="_")
out
#                Mean Median  IQR       SD
#Rater1_Rater4 1.1250    0.5 2.25 1.310216
#Rater2_Rater5 1.0625    0.0 3.00 1.436141

data

df1 <- structure(list(Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0), Rater2 = c(3, 
3, 3, 0, 0, 0, 0, 0), Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0), Rater5 = c(3, 
3, 2, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, 
-8L))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
1

A possible base approach:

df <- data.frame(                     # construct your original dataframe
  Rater1 = c(3, 3, 3, 0, 0, 0, 0, 0),
  Rater2 = c(3, 3, 3, 0, 0, 0, 0, 0),
  Rater4 = c(3, 2, 2, 1, 0, 0, 1, 0),
  Rater5 = c(3, 3, 2, 0, 0, 0, 0, 0)
)

combined <- data.frame(               # make a new dataframe with your desired variables
  R14 = with(df, c(Rater1, Rater4)),  
  R25 = with(df, c(Rater2, Rater5))  
)

sapply(combined, mean)                # compute mean of each column
sapply(combined, median)              # median
sapply(combined, sd)                  # standard deviation
sapply(combined, IQR)                 # interquartile range

Aaron Montgomery
  • 1,387
  • 8
  • 11
1

Another solution, using a for loop to compute the statistics in one go: First, create vectors for the raters you want to combine:

# Raters 2 and 4:
r24 <- as.integer(unlist(df1[,c("Rater2", "Rater4")]))
# Raters 1 and 5:
r15 <- as.integer(unlist(df1[,c("Rater1","Rater5")]))

Combine these vectors in a dataframe:

df <- data.frame(r15, r24)

And calculate the statistics:

for(i in 1:ncol(df)){
  print(c(mean(df[,i]), IQR(df[,i]), median(df[,i]), sd(df[,i])))
}
[1] 1.062500 3.000000 0.000000 1.436141
[1] 1.125000 2.250000 0.500000 1.310216
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0

A tidyverse/dplyr solution.

library(dplyr)

bind_rows(select(df, r12 = Rater1, r45 = Rater4),
          select(df, r12 = Rater2, r45 = Rater5)) %>%
  summarise_all(list(
    mean = mean,
    median = median,
    sd = sd,
    iqr = IQR
  ))
#>   r12_mean r45_mean r12_median r45_median r12_sd   r45_sd r12_iqr r45_iqr
#> 1    1.125   1.0625          0        0.5    1.5 1.236595       3       2

In case you want the output similar to the one in your question, use t() to transpose the result.

t(.Last.value)
Till
  • 3,845
  • 1
  • 11
  • 18