1

Suppose I have this data frame:

name <- rep(LETTERS[seq(from=1, to =2)], each=3)
MeasA <- c(1:6)
MeasB <- c(7:12)

df <- data.frame(name, MeasA, MeasB)

And I want to reshape into a format which has no idvar like this:

MeasA_A MeasB_A MeasB_B MeasB_B
 1        7        4      10
 2        8        5      11
 3        9        6      12

I have been reading about reshape and melt:

Reshaping data frame with duplicates

http://seananderson.ca/2013/10/19/reshape.html

But with those functions I need to specify an idvar. Ive tried:

tt <- reshape(df, timevar = "name", direction="wide")

and

tt <- dcast(df, ~name)

But they clearly dont work. Perhaps I need to use split (Split data.frame based on levels of a factor into new data.frames) then a reshape?

Community
  • 1
  • 1
Pete900
  • 2,016
  • 1
  • 21
  • 44
  • Do you want to get the `mean` value after dcast? Try `library(data.table); setDT(df)[, ind:=LETTERS[1:.N], name]; dcast(df, name~ind, value.var=c('MeasA', 'MeasB'), sep="_", mean)` – akrun Sep 07 '15 at 12:54
  • I dont need the mean values. Just columns with the Measurement (MeasA or MeasB) concatenated with name (and corresponding data under each). I will try and make my desired output more friendly. – Pete900 Sep 07 '15 at 13:01
  • It is better to show a small example data and then the expected output will be easy to show. – akrun Sep 07 '15 at 13:01
  • exactly, you will need to aggregate your data in some ways ... by mean or sum or maximum, etc – Colonel Beauvel Sep 07 '15 at 13:03

1 Answers1

2

We could split the data.frame to list by the 'name' column, cbind the list elements. We can change the column names using sub or paste.

res <- do.call(cbind,split(df[-1], df$name))
colnames(res) <- sub('([^.]+)\\.([^.]+)', '\\2_\\1', colnames(res))
res
#  MeasA_A MeasB_A MeasA_B MeasB_B
#1       1       7       4      10
#2       2       8       5      11
#3       3       9       6      12

If we want to use dcast, we may need to create sequence column grouped by the 'name'. Here, I am using dcast from the devel version of 'data.table' i.e. v1.9.5 as it can take multiple value.var columns. Instructions to install the devel version are here. We convert the 'data.frame' to 'data.table' (setDT(df)), create the sequence column ('i1'), grouped by 'name', use dcast and specify the value.var columns.

library(data.table)#v1.9.5+
setDT(df)[, i1:= 1:.N, by = name]
dcast(df, i1~name, value.var=c('MeasA', 'MeasB'))[, i1:= NULL][]
#   MeasA_A MeasA_B MeasB_A MeasB_B
#1:       1       4       7      10
#2:       2       5       8      11
#3:       3       6       9      12

In a similar way we can use the reshape from base R. We create the sequence column using ave and use that as 'idvarinreshape`.

df1 <- transform(df, i1= ave(seq_along(name), name, FUN=seq_along))
reshape(df1, idvar='i1', timevar='name', direction='wide')[-1]
#  MeasA.A MeasB.A MeasA.B MeasB.B
#1       1       7       4      10
#2       2       8       5      11
#3       3       9       6      12
akrun
  • 874,273
  • 37
  • 540
  • 662
  • ah ha yes lovely thats the one. So to break it down, 'split' splits it by name (what is df[-1]?) and cbind brings them back together. Im not sure what 'do.call' is but I can google that one. – Pete900 Sep 07 '15 at 13:13
  • 1
    @Pete900 I am just removing the first column using `df[-1]` as you don't have that column in the expected output. When you are `cbind`ing list elements, use `do.call` as the `cbind` alone will not work. – akrun Sep 07 '15 at 13:14