-1

I have a dataframe in long format like

id    varA    varB
1     'a'     112
1     'b'     212
1     'c'     308
2     'a'     99
2     'b'     123
2     'c'     452

I want to collapse per id the values in varB for varA == 'a' and varA == 'b'.

In a wide dataframe like:

id    varBa    varBb    varBc
1     112      212      308
2     99       123      452

... I'd simply use apply or the vectorised addition like df$collapsed = df$varBa + df$varBb.

How can I do this with the long-format dataframe? (Having only a wide format is not an option due to factor-wise follow-ups analysis).

Is it possible to collapse (or do any other arithmetic operation) in a long format so that I'd have an additional row for the collapsed values?.

ben_aaron
  • 1,504
  • 2
  • 19
  • 39

2 Answers2

1

Try this:

 dfw <- reshape(df,
 timevar = "varA",
 idvar = c("id"),
 direction = "wide")
 dfw
  id varB.a varB.b varB.c
1  1    112    212    308
4  2     99    123    452

Then you can do any math operations on the new data frame.

Robert
  • 5,038
  • 1
  • 25
  • 43
0

Using tidyr:

df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L),
                     varA = c("a","b", "c", "a", "b", "c"), 
                     varB = c(112L, 212L, 308L, 99L, 123L, 452L)),
                .Names = c("id", "varA", "varB"), 
                class = "data.frame", 
                row.names = c(NA,-6L))

library(tidyr)
wide_df <- spread(df, varA, varB )

  id   a   b   c
1  1 112 212 308
2  2  99 123 452

If you want to change the headers into varBa etc, you could use dplyr's mutate function to change the varA a bit.

wide_df <- dplyr::mutate(df, varA = paste("varB",df$varA, sep="")) %>% 
  spread(varA, varB )

  id varBa varBb varBc
1  1   112   212   308
2  2    99   123   452
phiver
  • 23,048
  • 14
  • 44
  • 56