Finding average values for multiple groups in tidy data in R

Question

I have a tidy dataframe with study data. "init_cont" and "family" represent the different conditions in this study. There are three possible options for init_cont (A, B, or C) and two possible options for family (D or E), yielding a 3x2 experimental design. In this example, there are two different questions that each participant must answer (specified in column "qnumber"). The "value" column indicates their response to the question asked.

id  init_cont  family  qnumber  value
1   A          D       1        5
1   A          D       2        3
2   B          D       1        4
2   B          D       2        2
3   C          E       1        4
3   C          E       2        3
4   A          E       1        5
4   A          E       2        2

I am trying to determine the best way (preferably within the tidyverse) to determine the average of the values for each question, separated by condition. There are 6 conditions, which come from the 6 combinations of the 3 options in init_cont combined with the 2 options in family. In this dataframe, there are only 2 questions, but the actual dataset has 14.

I know I could probably do this by making distinct dataframes for each of the 6 conditions and then breaking these down further to make distinct dataframes for each question, then finding the average values for each dataframe. There must be a better way to do this in fewer steps.

onlyphantom · Accepted Answer · 2018-04-29T04:55:06.873

1

Using tidyverse, to determine the average of the values for each question, separated by condition of say, family:

data %>% 
  group_by(family) %>% 
  summarize(avg_value = mean(value))

If you prefer, you can even find the average of the values for each question by condition of say family and a second (or more) variable, say, religion:

data %>% 
  group_by(family, religion) %>% 
  summarize(avg_value = mean(value))

EDIT 1: Based on feedback, here's the code to get the average value grouped by init_cont, family, and qnumber:

data %>%
    group_by(init_cont, family, qnumber) %>%
    summarize(avg_value = mean(value))

See a sample:

edited Apr 29 '18 at 04:55

answered Apr 29 '18 at 03:39

onlyphantom

8,606
4
44
58

Doing group_by(init_cont, family, qnumber) followed by summarize gave me exactly what I needed. – melbez Apr 29 '18 at 03:44
glad to hear! if you need any more help let me know and I can always improve the answer! – onlyphantom Apr 29 '18 at 03:45
@onlyphantom The above answer is not correct based on feedback from OP. Perhaps it would be better to update your answer to match OP's comment and data.It will be helpful for future users. – MKR Apr 29 '18 at 04:51

score 0 · Answer 2 · answered Apr 29 '18 at 03:50

0

We can use aggregate from base R

aggregate(value ~ family, data, mean)

answered Apr 29 '18 at 03:50

akrun

874,273
37
540
662

Finding average values for multiple groups in tidy data in R

2 Answers2