2

I have 3 data sets, each with variables time_tick, gyr_X_value, gyr_Y_value, and gyr_Z_value.

An example of one of the data sets is as follows:

 time_tick gyr_X_value  gyr_Y_value  gyr_Z_value
1   .01    .12             .24         -.28               
2   .12      0               0          .05
3   .04    .10               0          .17
4   .03      0            -.25          .15

I know that I can calculate the variance of the each individual data set with var(), but how can I calculate the variance of gyr_X_value across all three data sets?

Claus Wilke
  • 16,992
  • 7
  • 53
  • 104
brum2393
  • 23
  • 3
  • When you say *data sets* I can only assume you mean a *data frame*... You need to elaborate on your question and show what you have tried so far and where you are actually stuck. Using [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) question to help you write a minimal reproducible example (or *reprex*) will help you get a more positive response. – Kevin Arseneau Dec 14 '17 at 03:43

3 Answers3

1

We can place the datasets in a list, extract the 'gyr_X_value' column, and use the rowVars if we need to find the variance of each row

library(matrixStats)
rowVars(sapply(list(df1, df2, df3), `[[`, 'gyr_X_value'))

Suppose, the interest is to find variance of the specific column for each dataset, then use var after extracting the column

sapply(list(df1, df2, df3), function(x) var(x[['gyr_X_value']]))

Note: The object names are assumed as 'df1', 'df2', 'df3'

akrun
  • 874,273
  • 37
  • 540
  • 662
0

You can use rbind. Given data frames a, b, and c, they can be combined by row with

combined <- rbind(a,b,c)

See here for detailed usage.. Then you can use var() as usual on a given column, for example, combined[, 2].

Alan Effrig
  • 763
  • 4
  • 10
0

For those kinds of problems, I strongly recommend the tidyverse approach.

Your data:

df <- read.table(text = "time_tick gyr_X_value  gyr_Y_value  gyr_Z_value
1   .01    .12             .24         -.28               
2   .12      0               0          .05
3   .04    .10               0          .17
4   .03      0            -.25          .15", header = TRUE)

The calculation:

library(tidyverse)

df %>% gather(variable, value, -time_tick) %>%
  group_by(variable) %>%
  summarize(variance = var(value))

## A tibble: 3 x 2
#     variable variance
#        <chr>    <dbl>
#1 gyr_X_value 0.004100
#2 gyr_Y_value 0.040025
#3 gyr_Z_value 0.043425

Explanation: First, the gather function turns your wide data frame into a long one:

df %>% gather(variable, value, -time_tick)
#   time_tick    variable value
#1       0.01 gyr_X_value  0.12
#2       0.12 gyr_X_value  0.00
#3       0.04 gyr_X_value  0.10
#4       0.03 gyr_X_value  0.00
#5       0.01 gyr_Y_value  0.24
#6       0.12 gyr_Y_value  0.00
#7       0.04 gyr_Y_value  0.00
#8       0.03 gyr_Y_value -0.25
#9       0.01 gyr_Z_value -0.28
#10      0.12 gyr_Z_value  0.05
#11      0.04 gyr_Z_value  0.17
#12      0.03 gyr_Z_value  0.15

The group_by() function then sets up the grouping by variable, and the summarize() function calculates the variance separately within the groupings.

Claus Wilke
  • 16,992
  • 7
  • 53
  • 104