0

I have a data frame with these values:

X1  X2      X3
s1  45.11   1
s1  45.13   1
s1  53.42   2
s1  51.41   2
s2  96.76   3
s2  96.65   3
s4  77.9    4
s1  80.46   5
s3  43.58   2
s1  43.12   2
s1  41.51   3
s4  41.97   3
s1  108.97  6
s3  117.46  6
s4  40      3
s4  40      3
s5  25.4    1
s5  25.5    1

I want to convert it to a data frame in this kind of format:

    s1     s2   s3  s4  s5
1   45.12   0   0   0   25.45
2   49.32   0   43.58   0   0

In this, the values are the mean values of the X2 column above that match the criteria, i.e., are part of row s1 and have the value of X3 as 1.

How can I achieve this in R?

Dave2e
  • 22,192
  • 18
  • 42
  • 50
Daksh Agarwal
  • 43
  • 1
  • 7

3 Answers3

3

You could do this in base R (suppose your data are in data frame df):

r <- aggregate(X2~X1+X3, df[df$X3 %in% c(1,2),], mean)
round(t(xtabs(X2~X1+X3, r)), 2)

#   X1
#X3     s1    s2    s3    s4    s5
#  1 45.12  0.00  0.00  0.00 25.45
#  2 49.32  0.00 43.58  0.00  0.00
989
  • 12,579
  • 5
  • 31
  • 53
1

Using data.table:

setDT(df)
df.mean <- df[, mean(X2), by = .(X1, X3)]
df.mean.wide <- dcast(df.mean, X3 ~ X1, value.var = "V1")
df.mean.wide[is.na(df.mean.wide)] <- 0
df.mean.wide[1:2]

   X3       s1 s2    s3 s4    s5
1:  1 45.12000  0  0.00  0 25.45
2:  2 49.31667  0 43.58  0  0.00
s_baldur
  • 29,441
  • 4
  • 36
  • 69
1

Or you can use the newer tidyr and dplyr packages. The following example is aimed at breaking out the two steps (#1 summarizing your data; #2 converting to wide format):

library(dyplr)
library(tidyr)

# fake example data set
data_frame(
  X1 = rep(paste0("S", 1:5), times = 6),
  X2 = c(1:30) * 0.1,
  X3 = rep(1:10, each = 3)
) %>% 
  # summarize to calculate mean for each X1 & X3 group
  group_by(X1, X3) %>%
  summarize(X2.avg = mean(X2)) %>% 
  # spread into wide format with 0s for all missing combinations
  spread(X1, X2.avg, fill = 0) %>% 
  # if you really only want to look at the first two X3s
  filter(X3 < 3)
sebkopf
  • 2,335
  • 19
  • 18