1

Say I have the following wide/messy dataframe:

df1 <- data.frame(ID = c(1, 2), Gender = c("M","F"),
       Q1 = c(1, 5), Q2 = c(2, 6),
       Q3 = c(3, 7), Q4 = c(4, 8))

 ID Gender Q1 Q2 Q3 Q4
 1      M  1  2  3  4
 2      F  5  6  7  8

how can I turn it into this dataframe:

df2 <- data.frame(ID = c(1, 1, 2, 2), Gender = c("M", "M", "F", "F"),
       V1 = c(1, 3, 5, 7), V2 = c(2, 4, 6, 8))

 ID Gender V1 V2
 1      M  1  2
 1      M  3  4
 2      F  5  6
 2      F  7  8

I know there are multiple packages and functions (e.g., tidyr, reshape2, reshape function) that can accomplish this. Which is the easiest way to do it and how? Really appreciate any help anyone can provide. Thanks!

hsl
  • 670
  • 2
  • 10
  • 22
  • Thanks a lot @akrun! I'll stick with reshape for now because I'm still trying to learn data.table. – hsl Apr 19 '15 at 19:37

1 Answers1

3

You could try melt from the devel version of data.table i.e v1.9.5. It can take multiple variables in the measure.vars as a list. Instructions to install the devel version are here

library(data.table)#v1.9.5+
melt(setDT(df1), measure.vars=list(c(3,5), c(4,6)), 
  value.name=c('V1', 'V2'))[,variable:=NULL][order(ID)]
#   ID Gender V1 V2
#1:  1      M  1  2
#2:  1      M  3  4
#3:  2      F  5  6
#4:  2      F  7  8

Or use reshape from base R

res <- subset(reshape(df1, idvar=c('ID', 'Gender'), 
       varying=list(c(3,5), c(4,6)), direction='long'), select=-time)
row.names(res) <- NULL

Update

If we need to transform back the 'df2' to 'df1', dcast from data.table could be used. It can take multiple value.var columns. We need to create a sequence column (N) by group ('ID', 'Gender') before proceeding with dcast

 dcast(setDT(df2)[, N:=1:.N, list(ID, Gender)], ID+Gender~N, 
           value.var=c('V1', 'V2'))
 #   ID Gender 1_V1 2_V1 1_V2 2_V2
 #1:  1      M    1    3    2    4
 #2:  2      F    5    7    6    8

Or we create a sequence by group with ave and then use reshape from base R.

  df2 <- transform(df2, N= ave(seq_along(ID), ID, Gender, FUN=seq_along))
 reshape(df2, idvar=c('ID', 'Gender'), timevar='N', direction='wide')
 #   ID Gender V1.1 V2.1 V1.2 V2.2
 #1  1      M    1    2    3    4
 #3  2      F    5    6    7    8

data

df1 <- data.frame(ID = c(1, 2), Gender = c("M","F"), Q1 = c(1, 5), 
       Q2 = c(2, 6), Q3 = c(3, 7), Q4 = c(4, 8))

df2 <- data.frame(ID = c(1, 1, 2, 2), Gender = c("M", "M", "F", "F"),
   V1 = c(1, 3, 5, 7), V2 = c(2, 4, 6, 8))
akrun
  • 874,273
  • 37
  • 540
  • 662