2

I have data that looks like this,

posture code HR EE  a   b
cycling A03 102 100 3   6
standingA03 99  99  4   6
sitting A03 98  67  5   5
walking A03 97  78  3   6
cycling B01 111 76  5   5
standingB01 100 88  4   4
sitting B01 78  34  4   3
walking B01 99  99  2   2

I need to transpose it so it will look like the following:

code    cycling_HR  cycling_EE  cycling_a   cycling_b   standing_HR standing_EE standing_a  standing_b  sitting_HR  sitting_EE  sitting_a   sitting_b   walking_HR  walking_EE  walking_a   walking_b
A03     102    100  3       6   99          99          4   6   98  67  5   5   97  78  3   6
B01     111    76   5       5   100         88          4   4   78  34  4   3   99  99  2   2

and so on (sorry for the formatting). I was not able to find appropriate answer to clarify the issue. Any assistance would be more that welcome.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Dmitry Leykin
  • 485
  • 1
  • 7
  • 14
  • 1
    @User7598: Just because a question uses the word 'transpose' does not mean that they are using a precise mathematical definition or understanding. However, I did think that I should remove that tag since it referred to the mathematical meaning. – IRTFM Mar 24 '15 at 16:35
  • You are right... but these reshape questions are posted daily. There shouldn't be any reason someone can't find an existing answer to this. – User7598 Mar 24 '15 at 16:41
  • 1
    @User7598, Getting the terminology correct can be difficult if you have never done this kind of thing before. Many of the duplicates come from people being familiar with other terminology--pivot, cases to vars, transpose, and so on. – A5C1D2H2I1M1N2O1R2T1 Mar 25 '15 at 03:07

3 Answers3

5

This is a very basic "long to wide" reshaping problem.

You can do this in base R with the reshape function:

    reshape(mydf, direction = "wide", idvar = "code", timevar = "posture")
#   code HR.cycling EE.cycling a.cycling b.cycling HR.standing EE.standing
# 1  A03        102        100         3         6          99          99
# 5  B01        111         76         5         5         100          88
#   a.standing b.standing HR.sitting EE.sitting a.sitting b.sitting HR.walking
# 1          4          6         98         67         5         5         97
# 5          4          4         78         34         4         3         99
#   EE.walking a.walking b.walking
# 1         78         3         6
# 5         99         2         2

You can also look at a "dplyr" + "tidyr" approach, which might be something like this:

library(dplyr)
library(tidyr)
mydf %>%
  gather(var, val, HR:b) %>%
  unite(v1, posture, var) %>%
  spread(v1, val)
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
4

Or for a big data set (as reshape is veryyyy slow) you could try data.table v>=1.9.5

library(data.table)
dcast(setDT(df), code ~ posture, value.var = c("HR", "EE", "a", "b"))
#    code cycling_HR sitting_HR standing_HR walking_HR cycling_EE sitting_EE standing_EE walking_EE cycling_a sitting_a standing_a walking_a cycling_b sitting_b
# 1:  A03        102         98          99         97        100         67          99         78         3         5          4         3         6         5
# 2:  B01        111         78         100         99         76         34          88         99         5         4          4         2         5         3
#    standing_b walking_b
# 1:          6         6
# 2:          4         2

Benchmarks on slightly larger data (4 million rows):

library(dplyr)
library(tidyr)
require(data.table)
set.seed(1L)
df = data.frame(posture = c("cycling", "standing", "sitting", "walking"), 
                code = rep(paste("A", 1:1e6, sep=""), each=4L), 
                HR = sample(120, 4e6, TRUE),
                EE = sample(100, 4e6, TRUE), 
                a = sample(5, 4e6, TRUE), 
                b = sample(10, 4e6, TRUE), 
                stringsAsFactors=FALSE)

# base R approach
system.time(reshape(df, direction = "wide", idvar = "code", timevar = "posture"))
#    user  system elapsed 
#  23.183   0.470  23.838 

# dplyr + tidyr
system.time({
df %>%
  gather(var, val, HR:b) %>%
  unite(v1, posture, var) %>%
  spread(v1, val)
})
#    user  system elapsed 
#  17.312   1.046  18.446 

# data.table
system.time(dcast(setDT(df), code ~ posture, 
            value.var = c("HR", "EE", "a", "b")))
#    user  system elapsed 
#   1.216   0.136   1.367 
Arun
  • 116,683
  • 26
  • 284
  • 387
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • @akrun: Version problem? – IRTFM Mar 24 '15 at 16:36
  • @DavidArenburg I updated the data.table devel version only a few days ago – akrun Mar 24 '15 at 16:40
  • 1
    @DavidArenburg Thanks, found the problem. Your code works fine.. (+1) – akrun Mar 24 '15 at 16:42
  • With reshape2 v.1.4.1 and previous devel version of data.table 1.9.4, on Mac SL fork, I'm getting the same error that @akrun reports with that dataset. – IRTFM Mar 24 '15 at 16:43
  • @BondedDust, this won't work with `data.table 1.9.4`. And you don't need to load `reshape2` at all. – David Arenburg Mar 24 '15 at 16:44
  • @BondedDust Usually, I load `library(reshape2)` before `library(data.table)`. This time I loaded reshape2 after loading data.table for testing another code. It got solved by another fresh session – akrun Mar 24 '15 at 16:44
  • @DavidArenburg: Works with new version (and reshape2::dcast gets masked.) I think you should post the github install code and a warning that a toolchain for compiling C code will be needed. – IRTFM Mar 24 '15 at 16:49
  • @BondedDust I've provided a link on how to install the devel in the answer itself. Not sure why should I warn about the last part, do you see any possible issue with it? – David Arenburg Mar 24 '15 at 16:51
  • The range of sophistication for users of SO is rather wide. I put a warning for Mac users on that github page. – IRTFM Mar 24 '15 at 16:55
  • 1
    Thanks for the BM (and this great new functuanality) @Arun – David Arenburg Mar 29 '15 at 21:43
0

Using tidyr?

library(tidyr)

x<-data.frame(posture=c("cycling", "standing", "sitting", "walking"),
           code=c("A03", "A03", "B01", "B01"),
           HR=c(1,3,3,4),
           EE=c(1,3,3,5))

x2<-gather(x, key=type, value=vals, -c(code, posture))
x2$vars<-paste(x2$posture, x2$type, sep="_")

x2<-select(x2, -c(posture, type))
spread(x2, key=vars, value=vals)
ZRoss
  • 1,437
  • 1
  • 15
  • 32