18

I have a data frame that I melted using the reshape package that I would like to "un melt".

here is a toy example of the melted data (real data frame is 500x100 or larger) :

variable<-c(rep("X1",3),rep("X2",3),rep("X3",3))
value<-c(rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3))
dat <-data.frame(variable,value)
dat
 variable     value
1       X1 0.5285376
2       X1 0.5285376
3       X1 0.5285376
4       X2 0.1694908
5       X2 0.1694908
6       X2 0.1694908
7       X3 0.7446906
8       X3 0.7446906
9       X3 0.7446906

Each variable (X1, X2,X3) has values estimated at 3 different times (which in this toy example happen to be the same, but this is never the case).

I would like to get it (back) in the form of :

     X1        X2        X3
1 0.5285376 0.1694908 0.7446906
2 0.5285376 0.1694908 0.7446906
3 0.5285376 0.1694908 0.7446906

Basically, I would like the variable column to be sorted on ID (X1, X2 etc) and become column headings. I have tried various permutations of cast, dcast, recast, etc.. and cant seem to get the data in the format that I want. It was easy enough to 'melt' data from the wide form to the longer form (e.g. the dat datset), but getting it back is proving difficult. Any ideas? I know this is relatively simple, but I am having a hard time conceptualizing how to do this in reshape or reshape2.

Thanks, LP

LP_640
  • 579
  • 1
  • 5
  • 17
  • It's a little tricky here because in your melted data you've lost the information about which row it was originally in, that is at which of the three times it was estimated at. If you're willing to assume that it was always 1-2-3 in order (or 1-2, if there are only two), you can make a row with that information and then cast it back. – Aaron left Stack Overflow Sep 19 '14 at 14:05

2 Answers2

24

I typically do this by creating an id column and then using dcast:

> dat
  variable     value
1       X1 0.4299397
2       X1 0.4299397
3       X1 0.4299397
4       X2 0.2531551
5       X2 0.2531551
6       X2 0.2531551
7       X3 0.3972119
8       X3 0.3972119
9       X3 0.3972119
> dat$id <- rep(1:3,times = 3)
> dcast(data = dat,formula = id~variable,fun.aggregate = sum,value.var = "value")
  id        X1        X2        X3
1  1 0.4299397 0.2531551 0.3972119
2  2 0.4299397 0.2531551 0.3972119
3  3 0.4299397 0.2531551 0.3972119
joran
  • 169,992
  • 32
  • 429
  • 468
  • 2
    As they mention that the number of values might not be consistent, maybe consider changing your `dat$id` to something like `with(dat, ave(rep(1, nrow(dat)), variable, FUN = seq_along))`. – A5C1D2H2I1M1N2O1R2T1 Sep 19 '14 at 15:23
  • ahh, thanks. I was missing the id. This works well for this particular data set which was generated via simulations. – LP_640 Sep 19 '14 at 18:09
1

Depending on how robust you need this to be , the following will correctly cast for varying number of occurrences of variables (and in any order).

> variable<-c(rep("X1",5),rep("X2",4),rep("X3",3))
> value<-c(rep(rnorm(1,.5,.2),5),rep(rnorm(1,.5,.2),4),rep(rnorm(1,.5,.2),3))
> dat <-data.frame(variable,value)
> dat <- dat[order(rnorm(nrow(dat))),]
> dat
   variable     value
11       X3 1.0294454
8        X2 0.6147509
2        X1 0.3537012
7        X2 0.6147509
9        X2 0.6147509
5        X1 0.3537012
4        X1 0.3537012
12       X3 1.0294454
3        X1 0.3537012
1        X1 0.3537012
10       X3 1.0294454
6        X2 0.6147509
> dat$id = numeric(nrow(dat))
> for (i in 1:nrow(dat)){
+   dat_temp <- dat[1:i,]
+   dat[i,]$id <- nrow(dat_temp[dat_temp$variable == dat[i,]$variable,])
+ }
> cast(dat, id~variable, value = 'value')
  id        X1        X2       X3
1  1 0.3537012 0.6147509 1.029445
2  2 0.3537012 0.6147509 1.029445
3  3 0.3537012 0.6147509 1.029445
4  4 0.3537012 0.6147509       NA
5  5 0.3537012        NA       NA
Leo
  • 1,773
  • 12
  • 19