Horizontal to Vertical Format in R

Question

I have a data frame with these values:

X1  X2      X3
s1  45.11   1
s1  45.13   1
s1  53.42   2
s1  51.41   2
s2  96.76   3
s2  96.65   3
s4  77.9    4
s1  80.46   5
s3  43.58   2
s1  43.12   2
s1  41.51   3
s4  41.97   3
s1  108.97  6
s3  117.46  6
s4  40      3
s4  40      3
s5  25.4    1
s5  25.5    1

I want to convert it to a data frame in this kind of format:

    s1     s2   s3  s4  s5
1   45.12   0   0   0   25.45
2   49.32   0   43.58   0   0

In this, the values are the mean values of the X2 column above that match the criteria, i.e., are part of row s1 and have the value of X3 as 1.

How can I achieve this in R?

Possible dupe [aggregate and reshape from long to wide](http://stackoverflow.com/q/23611735/903061), though there's probably a better one out there. — Gregor Thomas, Oct 10 '16 at 21:03
Sorry, forgot the `value.var` argument above, `reshape2::dcast(X3 ~ X1, data = df, fun.aggregate = mean, value.var = "X2")` should do it. — Gregor Thomas, Oct 10 '16 at 21:12

989 · Answer 1 · 2016-10-10T21:27:36.463

3

You could do this in base R (suppose your data are in data frame df):

r <- aggregate(X2~X1+X3, df[df$X3 %in% c(1,2),], mean)
round(t(xtabs(X2~X1+X3, r)), 2)

#   X1
#X3     s1    s2    s3    s4    s5
#  1 45.12  0.00  0.00  0.00 25.45
#  2 49.32  0.00 43.58  0.00  0.00

edited Oct 10 '16 at 21:27

answered Oct 10 '16 at 21:20

989

12,579
5
31
53

score 1 · Answer 2 · answered Oct 10 '16 at 22:02

Using data.table:

setDT(df)
df.mean <- df[, mean(X2), by = .(X1, X3)]
df.mean.wide <- dcast(df.mean, X3 ~ X1, value.var = "V1")
df.mean.wide[is.na(df.mean.wide)] <- 0
df.mean.wide[1:2]

   X3       s1 s2    s3 s4    s5
1:  1 45.12000  0  0.00  0 25.45
2:  2 49.31667  0 43.58  0  0.00

score 1 · Answer 3 · answered Oct 10 '16 at 22:22

Or you can use the newer tidyr and dplyr packages. The following example is aimed at breaking out the two steps (#1 summarizing your data; #2 converting to wide format):

library(dyplr)
library(tidyr)

# fake example data set
data_frame(
  X1 = rep(paste0("S", 1:5), times = 6),
  X2 = c(1:30) * 0.1,
  X3 = rep(1:10, each = 3)
) %>% 
  # summarize to calculate mean for each X1 & X3 group
  group_by(X1, X3) %>%
  summarize(X2.avg = mean(X2)) %>% 
  # spread into wide format with 0s for all missing combinations
  spread(X1, X2.avg, fill = 0) %>% 
  # if you really only want to look at the first two X3s
  filter(X3 < 3)

Horizontal to Vertical Format in R

3 Answers3

Linked