0

Suppose I have a r data.frame df which looks like this:

df = data.frame(matrix(rnorm(24*5), nrow=24, ncol=5))
f1 = rep(c('A', 'B', 'C', 'D'), each=6)
f2 = rep(c('i', 'ii'), times=12)
df$f1 = as.factor(f1)
df$f2 = as.factor(f2)
df

            X1           X2           X3           X4           X5 f1 f2
1   0.43199861  0.710242961  1.339928854 -1.241609127  0.222987482  A  i
2   1.38058957 -0.084379985  0.007244097 -1.505817169  1.841083186  A ii
3  -0.07266697  0.194356316 -0.566179369  1.178202899 -1.583327136  A  i
4  -0.10157803  0.137415112 -0.011487657 -0.324716212  1.161609061  A ii
5   0.98067650  1.824717342 -1.048111998 -0.825228970 -0.968037647  A  i
6   0.24261186 -2.116217786  0.027420259 -1.232210879 -1.868444772  A ii
7  -0.73898107 -0.883783872 -0.556182026 -1.662352192 -0.583576555  B  i
8  -1.25095555 -0.583574360  0.285764366  1.959217909  0.625261013  B ii
9  -0.30281764 -1.319204327 -0.984133568 -1.219553912 -0.059147710  B  i
10 -1.85947863  0.384337575  0.713635785 -1.101081205 -0.378312099  B ii
11 -0.50185467 -0.072254218  0.163350676 -1.718950235 -1.367719178  B  i
12  0.48938546 -0.005681783 -0.326662794  1.027273649 -0.490005391  B ii
13 -1.24160913  0.222987482  0.431998610  0.710242961  1.339928854  C  i
14 -1.50581717  1.841083186  1.380589565 -0.084379985  0.007244097  C ii
15  1.17820290 -1.583327136 -0.072666966  0.194356316 -0.566179369  C  i
16 -0.32471621  1.161609061 -0.101578026  0.137415112 -0.011487657  C ii
17 -0.82522897 -0.968037647  0.980676496  1.824717342 -1.048111998  C  i
18 -1.23221088 -1.868444772  0.242611864 -2.116217786  0.027420259  C ii
19 -1.66235219 -0.583576555 -0.738981072 -0.883783872 -0.556182026  D  i
20  1.95921791  0.625261013 -1.250955549 -0.583574360  0.285764366  D ii
21 -1.21955391 -0.059147710 -0.302817635 -1.319204327 -0.984133568  D  i
22 -1.10108120 -0.378312099 -1.859478634  0.384337575  0.713635785  D ii
23 -1.71895024 -1.367719178 -0.501854665 -0.072254218  0.163350676  D  i
24  1.02727365 -0.490005391  0.489385461 -0.005681783 -0.326662794  D ii

What is the best way to average this data based on both factors? Here the averageing would result in 8 rows since we have 4 levels in f1 and 2 levels in f2.

I've looked at by and aggregate. The idea was to use a formula to specify the groups. The problem is that I have many X variables so I can't write them all out in a formula.

CiaranWelsh
  • 7,014
  • 10
  • 53
  • 106
  • `df %>% group_by(f1, f2) %>% summarise_all(mean)` – Dan Mar 11 '18 at 16:57
  • I am not sure if that question linked in duplicate will answer your question. So here is one possible solution: `aggregate(df, by=list(df$f1, df$f2), FUN=mean)`. Note that `by` in aggregate can accept multiple factors, not just one. I am sure you can handle the rest. – Karolis Koncevičius Mar 11 '18 at 16:57
  • 1
    Thanks both for the suggestions. The linked answer solved my problem in another way, though the one posted by @KarolisKoncevičius was the one I was after. – CiaranWelsh Mar 11 '18 at 16:58

0 Answers0