0

I Have following data frame with one numeric variable and 2 characters. I want to show mean for numeric variable(z) for every combination of characters values. I used aggregate function to do it:

df <- data.frame(x=sample(c("a","b","c"),100,replace = T),
                 y=sample(c("A","B","C"),100, replace = T),
                 z=rnorm(100,2))
aggregate(df[,3],list(id1 = df[,1], id2 = df[,2]),mean)

Output looks like this:

 id1 id2        x
1   a   A 2.052119
2   b   A 2.058046
3   c   A 2.397236
4   a   B 2.342341
5   b   B 2.182605
6   c   B 2.227108
7   a   C 1.733620
8   b   C 1.725497
9   c   C 1.966901

I'd like to transform it to look like that, or use other function which will give output below:

         a        b        c
A 2.052119 2.058046 2.397236
B 2.342341 2.182605 2.227108
C 1.733620 1.725497 1.966901
Szymon Fraś
  • 96
  • 1
  • 7
  • If you store the output of `aggregate` function in `df1`, you could do `tidyr::pivot_wider(df1, names_from = id1, values_from = x)` – Ronak Shah Nov 18 '20 at 00:54

1 Answers1

0

An easier option is xtabs to get the sum

xtabs(z ~  y + x, df)

and tapply for mean

with(df, tapply(z, list(y, x), FUN = mean))
#      a        b        c
#A 1.485499 2.785789 2.308899
#B 1.781358 2.406958 1.752952
#C 2.101626 1.873654 2.049453

Or using pivot_wider with values_fn as mean

library(tidyr)
pivot_wider(df, names_from = x, values_from = z, values_fn = mean)
# A tibble: 3 x 4
#  y         a     c     b
#  <chr> <dbl> <dbl> <dbl>
#1 A      1.49  2.31  2.79
#2 C      2.10  2.05  1.87
#3 B      1.78  1.75  2.41
akrun
  • 874,273
  • 37
  • 540
  • 662