1

Similar to this post, I want to group a data frame by an ID (lets say Month) and calculate the mean and standard deviation per group. The difference is that I want the two columns Rate 1 and Rate 2 to be combined into one with the mean and sd of both.

Name     Month  Rate1     Rate2
Aira       1      12        23
Aira       2      18        73
Aira       3      19        45
Ben        1      53        19
Ben        2      22        87

The data frame above should be grouped by Month and for each month calculate the mean rate over both columns. For example, the mean of month 1 should be (12 + 23 + 53 + 19) / 4 = 26.75. I assume the approach for sd is similar.

Month Mean_rate
1     26.75
2     50
3     32
Community
  • 1
  • 1
stefanbschneider
  • 5,460
  • 8
  • 50
  • 88

1 Answers1

1
## Input data frame
df <- data.frame(Name=sample(letters,5),Month=c(1,2,3,1,2),Rate1=c(12,18,19,53,22),Rate2=c(23,73,45,19,87))

## Split data set on month
df_splitted <- split(df[,3:4],df$Month)

## Desired Output
df_out <- data.frame(Month=as.numeric(names(df_splitted)),
                     Mean=sapply(lapply(df_splitted,unlist),mean),
                     sd=sapply(lapply(df_splitted,unlist),sd),
                     stringsAsFactors=FALSE)
## Plot
plot(df_out$Month, df_out$Mean,
ylim=range(c(df_out$Mean-df_out$sd, df_out$Mean+df_out$sd)),
pch=19, xlab="Measurements", ylab="Mean +/- SD",
main="Scatter plot with std.dev error bars")
arrows(df_out$Month, df_out$Mean-df_out$sd, df_out$Month,
        df_out$Mean+df_out$sd, length=0.05, angle=90, code=3)

## Explanation
# Split the data frame into a list of data frame while keeping
# rows with same month value together
temp1 <- split(df[,3:4],df$Month)

# Convert the list of data frames into list of vectors
temp2 <- lapply(temp1,unlist)

# For every vector in list it calculates the mean
sapply(temp2,mean)

The resulting object is a vector while names of the vector represent the month for which it is calculated.

anonR
  • 849
  • 7
  • 26
  • This works but could you explain what it does and what the resulting data type is? I want to use it to draw a scatter plot with errorbars. – stefanbschneider Jan 27 '17 at 17:35
  • Explanation Added – anonR Jan 27 '17 at 17:44
  • Sorry, I'm new to R and I'm having problems working with the resulting data. When I want to plot the means, I had to extract the months for the x-axis (as numbers) and then get just the means for the y axis (using `names()` and `unname()`). Is there an easy way to do this? Or is it possible to store the result in a data frame like the one in my question? – stefanbschneider Jan 28 '17 at 09:38
  • The `Month` column in the data frame is now a factor, right? Unfortunately, I can't use that to draw arrows such as in [this post](http://stackoverflow.com/questions/13032777/scatter-plot-with-error-bars). I tried `as.numeric()` and `levels()` but the levels are in the wrong order and don't match the order of the other columns. – stefanbschneider Jan 29 '17 at 09:47