How to calculate the variance of specific variable across multiple datasets in R

Question

I have 3 data sets, each with variables time_tick, gyr_X_value, gyr_Y_value, and gyr_Z_value.

An example of one of the data sets is as follows:

 time_tick gyr_X_value  gyr_Y_value  gyr_Z_value
1   .01    .12             .24         -.28               
2   .12      0               0          .05
3   .04    .10               0          .17
4   .03      0            -.25          .15

I know that I can calculate the variance of the each individual data set with var(), but how can I calculate the variance of gyr_X_value across all three data sets?

When you say *data sets* I can only assume you mean a *data frame*... You need to elaborate on your question and show what you have tried so far and where you are actually stuck. Using [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) question to help you write a minimal reproducible example (or *reprex*) will help you get a more positive response. — Kevin Arseneau, Dec 14 '17 at 03:43

score 1 · Accepted Answer · answered Dec 14 '17 at 03:30

We can place the datasets in a list, extract the 'gyr_X_value' column, and use the rowVars if we need to find the variance of each row

library(matrixStats)
rowVars(sapply(list(df1, df2, df3), `[[`, 'gyr_X_value'))

Suppose, the interest is to find variance of the specific column for each dataset, then use var after extracting the column

sapply(list(df1, df2, df3), function(x) var(x[['gyr_X_value']]))

Note: The object names are assumed as 'df1', 'df2', 'df3'

score 0 · Answer 2 · answered Dec 14 '17 at 03:40

0

You can use rbind. Given data frames a, b, and c, they can be combined by row with

combined <- rbind(a,b,c)

See here for detailed usage.. Then you can use var() as usual on a given column, for example, combined[, 2].

answered Dec 14 '17 at 03:40

Alan Effrig

763
4
10

score 0 · Answer 3 · answered Dec 14 '17 at 05:21

For those kinds of problems, I strongly recommend the tidyverse approach.

Your data:

df <- read.table(text = "time_tick gyr_X_value  gyr_Y_value  gyr_Z_value
1   .01    .12             .24         -.28               
2   .12      0               0          .05
3   .04    .10               0          .17
4   .03      0            -.25          .15", header = TRUE)

The calculation:

library(tidyverse)

df %>% gather(variable, value, -time_tick) %>%
  group_by(variable) %>%
  summarize(variance = var(value))

## A tibble: 3 x 2
#     variable variance
#        <chr>    <dbl>
#1 gyr_X_value 0.004100
#2 gyr_Y_value 0.040025
#3 gyr_Z_value 0.043425

Explanation: First, the gather function turns your wide data frame into a long one:

df %>% gather(variable, value, -time_tick)
#   time_tick    variable value
#1       0.01 gyr_X_value  0.12
#2       0.12 gyr_X_value  0.00
#3       0.04 gyr_X_value  0.10
#4       0.03 gyr_X_value  0.00
#5       0.01 gyr_Y_value  0.24
#6       0.12 gyr_Y_value  0.00
#7       0.04 gyr_Y_value  0.00
#8       0.03 gyr_Y_value -0.25
#9       0.01 gyr_Z_value -0.28
#10      0.12 gyr_Z_value  0.05
#11      0.04 gyr_Z_value  0.17
#12      0.03 gyr_Z_value  0.15

The group_by() function then sets up the grouping by variable, and the summarize() function calculates the variance separately within the groupings.

How to calculate the variance of specific variable across multiple datasets in R

3 Answers3