R calculate mean square for all sub groups in a subset

Question

how do I calculate the mean square of all 2019_Preston_STD,2019_Preston_V1,2019_Preston_V2 etc using the Value column, then the adjmth1, adjmth3 columns

structure(list(IDX = c("2019_Preston_STD", "2019_Preston_V1", 
"2019_Preston_V2", "2019_Preston_V3", "2019_Preston_W1", "2019_Preston_W2"
), Value = c(3L, 2L, 3L, 2L, 3L, 5L), adjmth1 = c(2.87777777777778, 
1.85555555555556, 2.01111111111111, 1.77777777777778, 3.62222222222222, 
4.45555555555556), adjmth3 = c(2.9328763348507, 2.08651828334684, 
2.80282946626847, 2.15028039284054, 2.68766916156347, 4.51425274916654
), adjmth13 = c(2.81065411262847, 1.82585524933201, 1.81394057737959, 
1.40785681078568, 3.30989138378569, 4.7301083495049)), row.names = 29:34, class = "data.frame")

Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. — r2evans, Mar 07 '22 at 00:45
sounds like summarizing by group, see https://stackoverflow.com/q/11562656/3358272, it's likely a dupe of that — r2evans, Mar 07 '22 at 00:45

Abdur Rohman · Accepted Answer · 2022-03-07T03:30:06.723

This task can be done in many ways, as shown in the link that @r2evans pointed out. My favorite one is dplyr using summarize(across() because to me its syntax is easy to understand and easy to apply to many columns. It also presents the resulted numbers in nice format.

For example, from iris data I want to get the arithmetic mean of Sepal.Length, Petal.Length, and Petal.Width for each of species : setosa, versicolor, and virginica. Here is the head of the data:

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

And here is how to get the mean in each species:

iris %>% group_by(Species) %>% 
         summarize(across(c(Sepal.Length, Petal.Length, Petal.Width), mean))
# A tibble: 3 x 4
# Species    Sepal.Length Petal.Length Petal.Width
# <fct>             <dbl>        <dbl>       <dbl>
# 1 setosa             5.01         1.46       0.246
# 2 versicolor         5.94         4.26       1.33 
# 3 virginica          6.59         5.55       2.03

As for your task, first you need to define the function for the mean square (because its definition slightly varies in some references). Then, you apply it to your data frame using summarize(across()).

For example, you define the mean square function as follows:

meansq <- function(x) sum((x-mean(x))^2)/(length(x)-1)

Note: This definition requires that length(x) doesn't equal 1, or otherwise NaN will be produced.

You can apply it to your data frame newdata as follows:

newdata %>% group_by(IDX) %>% 
            summarize(across(c(Value, adjmth1, adjmth3), meansq)

Thanks Abdur , that will do the trick! – R.Merritt Mar 08 '22 at 06:19 — R.Merritt, Mar 08 '22 at 06:19

R calculate mean square for all sub groups in a subset

1 Answers1