16

I know the wide to long has been asked way too many times on here but I can't figure out how to turn the following into long format. Shoot I even asked one of the wide to long with 2 repeated measures on SO. I'm becoming frustrated with my inability to convert my data. How can I turn this (variable order doesn't matter):

      id trt    work.T1   play.T1   talk.T1   total.T1    work.T2    play.T2   talk.T2  total.T2
1   x1.1 cnt 0.34434350 0.7841665 0.1079332 0.88803151 0.64836951 0.87954320 0.7233519 0.5630988
2   x1.2  tr 0.06132255 0.8426960 0.3338658 0.04685878 0.23478670 0.19711687 0.5164015 0.7617968
3   x1.3  tr 0.36897981 0.1834721 0.3241316 0.76904051 0.07629721 0.06945971 0.4118995 0.7452974
4   x1.4  tr 0.40759356 0.5285396 0.5654258 0.23022542 0.92309504 0.15733957 0.4132653 0.7078273
5   x1.5 cnt 0.91433676 0.7029476 0.2031782 0.31518412 0.14721669 0.33345678 0.7620444 0.9868082
6   x1.6  tr 0.88870525 0.9132728 0.2197045 0.28266959 0.82239037 0.18006177 0.2591765 0.4516309
7   x1.7 cnt 0.98373218 0.2591739 0.6331153 0.71319565 0.41351839 0.14648269 0.7631898 0.1182174
8   x1.8  tr 0.47719528 0.7926248 0.3525205 0.86213792 0.61252061 0.29057544 0.9824048 0.2386353
9   x1.9  tr 0.69350823 0.6144696 0.8568732 0.10632352 0.06812050 0.93606889 0.6701190 0.4705228
10 x1.10 cnt 0.42574646 0.7006205 0.9507216 0.55032776 0.90413220 0.10246047 0.5899279 0.3523231

into this:

      id trt time       work       play      talk      total
1   x1.1 cnt    1 0.34434350 0.78416653 0.1079332 0.88803151
2   x1.2  tr    1 0.06132255 0.84269599 0.3338658 0.04685878
3   x1.3  tr    1 0.36897981 0.18347215 0.3241316 0.76904051
4   x1.4  tr    1 0.40759356 0.52853960 0.5654258 0.23022542
5   x1.5 cnt    1 0.91433676 0.70294755 0.2031782 0.31518412
6   x1.6  tr    1 0.88870525 0.91327276 0.2197045 0.28266959
7   x1.7 cnt    1 0.98373218 0.25917392 0.6331153 0.71319565
8   x1.8  tr    1 0.47719528 0.79262477 0.3525205 0.86213792
9   x1.9  tr    1 0.69350823 0.61446955 0.8568732 0.10632352
10 x1.10 cnt    1 0.42574646 0.70062053 0.9507216 0.55032776
11  x1.1 cnt    2 0.64836951 0.87954320 0.7233519 0.56309884
12  x1.2  tr    2 0.23478670 0.19711687 0.5164015 0.76179680
13  x1.3  tr    2 0.07629722 0.06945971 0.4118995 0.74529740
14  x1.4  tr    2 0.92309504 0.15733957 0.4132653 0.70782726
15  x1.5 cnt    2 0.14721669 0.33345678 0.7620444 0.98680824
16  x1.6  tr    2 0.82239038 0.18006177 0.2591765 0.45163091
17  x1.7 cnt    2 0.41351839 0.14648269 0.7631898 0.11821741
18  x1.8  tr    2 0.61252061 0.29057544 0.9824048 0.23863532
19  x1.9  tr    2 0.06812050 0.93606889 0.6701190 0.47052276
20 x1.10 cnt    2 0.90413220 0.10246047 0.5899279 0.35232307

The Data Set

id <- paste('x', "1.", 1:10, sep="")
set.seed(10)
DF <- data.frame(id, trt=sample(c('cnt', 'tr'), 10, T), work.T1=runif(10),
    play.T1=runif(10), talk.T1=runif(10), total.T1=runif(10),
    work.T2=runif(10), play.T2=runif(10), talk.T2=runif(10), 
    total.T2=runif(10))

Thank you in advance!

EDIT: Something screwy happened when I was using set.seed (certainly an error I did). The actually data above is not the data you'll get if you use set.seed(10). I'm leaving the error for historical accuracy and it really doesn't affect the solutions people gave.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519

5 Answers5

10

This is pretty close and changing the names of columns should be within your skillset:

reshape(DF, 
       varying=c(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ), 
       direction="long")

EDIT: Adding a version that is almost an exact solution:

reshape(DF, varying=list(work= c(3, 7), play= c(4,8), talk= c(5,9), total= c(6,10) ), 
        v.names=c("Work", "Play", "Talk", "Total"), 
          # that was needed after changed 'varying' arg to a list to allow 'times' 
        direction="long",  
        times=1:2,        # substitutes number for T1 and T2
        timevar="times")  # to name the time col
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    thank you so much! I was close to that but didn't quite understand what I was doing. Very easy and I learned a bit more about reshape in base (powerful yet sometimes confusing function. – Tyler Rinker Mar 13 '12 at 13:45
  • I completely agree that it can be confusing. Like Ben Bolker, I used to just use reshape::melt. – IRTFM Mar 13 '12 at 13:52
  • just came back here (I wish SO told you if someone edit's their answer; just a suggestion that you add a comment when you do so SO informs the posters) and I liked your answer before. I really like it now. The annotations are terrific. Thank you! – Tyler Rinker Mar 15 '12 at 13:51
8

The most concise way is to use tidyr combined with dplyr library.

library(tidyr)
library(dplyr)
result <- DF %>%
  # transfer to 'long' format
  gather(loc, value, work.T1:total.T2) %>%
  # separate the column into location and time
  separate(loc, into = c('loc', 'time'), '\\.') %>%
  # transfer to 'short' format
  spread(loc, value) %>%
  mutate(time = as.numeric(substr(time, 2, 2))) %>%
  arrange(time)

tidyr is designed specifically to make data tidy.

Mr369
  • 384
  • 4
  • 17
Yingsen Mao
  • 81
  • 1
  • 2
  • 2
    I wouldn't call that concise in comparison to other solutions. – Tyler Rinker Sep 19 '15 at 00:21
  • 2
    This is superior to the solutions that use `reshape` -- easier to read thanks to its functional style (the `%>%` operator) and more performant too, I think, due to key portions being written in C++. – GregT Dec 20 '15 at 07:07
  • 1
    what is the `enter code here` part supposed to be? @Yingsen Mao – vashts85 Jul 18 '17 at 14:52
  • 1
    @GregT - By my testing, the `reshape(DF, varying=3:10, sep=".T", direction="long")` code from @A5... is about 10x faster than this pipeline when dealing with 1M records - 22 secs vs 2.6 secs. Not the end of the world, but it's not higher-performing. – thelatemail Aug 27 '18 at 23:22
3

Oddly enough I don't seem to get the same numbers as you (which I should since we both used set.seed(10)?) but otherwise this seems to do the trick:

library(reshape)  #this might work with reshape2 as well, I haven't tried ...
DF2 <- melt(DF,id.vars=1:2)
## split 'activity.time' label into two separate variables
DF3 <- cbind(DF2,
             colsplit(as.character(DF2$variable),"\\.",
                      names=c("activity","time")))
## rename time, reorder factors:
DF4 <- transform(DF3,
                 time=as.numeric(gsub("^T","",time)),
                 activity=factor(activity,
                   levels=c("work","play","talk","total")),
                 id=factor(id,levels=paste("x1",1:10,sep=".")))
## reshape back to wide
DF5 <- cast(subset(DF4,select=-variable),id+trt+time~activity)
## reorder
DF6 <- with(DF5,DF5[order(time,id),])

It's more complicated than @DWin's answer but maybe (?) more general.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thank you Ben. I am adding both your response and Dwin's to my (now over 200 pages) R notes. Both responses were very insightful depending on the situation. +1 – Tyler Rinker Mar 13 '12 at 13:47
3

If you really didn't want the "T" in the "time" variable in the output, couldn't you simply do the following?

names(DF) = sub("T", "", names(DF))
reshape(DF, direction="long", varying=3:10)

Or, without changing names(DF), you could simply set the sep= argument to include "T":

reshape(DF, direction="long", varying=3:10, sep=".T")

I'm a bit confused, though. As Ben Bolker pointed out a in his comment, your "dataset code" doesn't provide the same numbers as what you have. Also, the output of DWin and mine matches perfectly, but it does not match with the "into this" output that you have in your original question.

I checked this by creating one data frame named "DWin" with his results, and one data frame named "mine" with my results and compared them using DWin == mine.

Can you verify that the output we've gotten is actually what you needed?

Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Yes the output you and DWin got is correct. Problem with my use of set.seed. I apologize. Great solution. very little code. – Tyler Rinker Mar 15 '12 at 14:30
0

Another way to approach the problem that requires very little code but would likely be slower,:

DF.1 <- DF[, 1:2]
DF.2 <- DF[, 3:6] 
DF.3 <- DF[, 7:10]

names(DF.2) <- names(DF.3) <- unlist(strsplit(names(DF.2), ".", fixed=T))[c(T,F)]
time <- rep(1:2, each=nrow(DF.1))
data.frame(rbind(DF.1, DF.1), time, rbind(DF.2, DF.3))
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519