I have a data.frame
with the mean and standard error for two variables, var1
and var2
.
This data.frame
, original_df
, came from creating those statistics from data for each of two groups:
original_df <- data.frame(group_dummy_code = c(0, 1),
var1_mean = c(1.5, 2.5),
var1_se = c(.025, .05),
var2_mean = c(3.5, 4.5),
var2_se = c(.075, .1))
> original_df
group_dummy_code var1_mean var1_se var2_mean var2_se
1 0 1.5 0.025 3.5 0.075
2 1 2.5 0.050 4.5 0.100
I'm trying to use the tidyr
function gather()
to change the data.frame
into desired_df
in order to plot the two variables' means and standard errors:
desired_df <- data.frame(group_dummy_code = c(0, 1, 0, 1),
key = c("var1", "var1", "var2", "var2"),
val_mean = c(1.5, 2.5, 3.5, 4.5),
val_se = c(.025, .05, .075, .1))
> desired_df
group_dummy_code key val_mean val_se
1 0 var1 1.5 0.025
2 1 var1 2.5 0.050
3 0 var2 3.5 0.075
4 1 var2 4.5 0.100
I tried to gather()
twice with the following:
df %>%
gather(mean_key, mean_val, -group_dummy_code, -contains("se")) %>%
gather(se_key, se_val, -group_dummy_code, -mean_key, -mean_val)
But, this results in too many rows (in particular, with multiple standard errors for each mean):
group_dummy_code mean_key mean_val se_key se_val
1 0 var1_mean 1.5 var1_se 0.025
2 1 var1_mean 2.5 var1_se 0.050
3 0 var2_mean 3.5 var1_se 0.025
4 1 var2_mean 4.5 var1_se 0.050
5 0 var1_mean 1.5 var2_se 0.075
6 1 var1_mean 2.5 var2_se 0.100
7 0 var2_mean 3.5 var2_se 0.075
8 1 var2_mean 4.5 var2_se 0.100
This seems like a fairly common processing step, especially after creating statistics for the mean and standard deviation for a number of variables, but gather()
ing twice--once for the mean and once for the standard error variables--doesn't seem like a good approach.
Using tidyr
(or dplyr
or another package), how can I create desired_df
from original_df
?