1

The title may be a little ambiguous, let me explain what I'm trying to do
This is my data frame:

Person<-c("Andrew","John","Mike")
Age<-c(25,34,21)
ColA_1<-c(1,5,7)
ColA_2<-c(5,0,9)
ColA_3<-c(4,4,1)
ColB_1<-c(16,55,37)
ColB_2<-c(25,14,39)
ColB_3<-c(43,64,31)

df<-data.frame(Person=Person,Age=Age,
    ColA_1=ColA_1,
    ColA_2=ColA_2,
    ColA_3=ColA_3,
    ColB_1=ColB_1,
    ColB_2=ColB_2,
    ColB_3=ColB_3)

What I would like to achieve is a new dataframe in long format, with columns B (ColB_1,ColB_2,ColB_3) beneath columns A, while duplicating value for Person and Age. In SPSS this would be the equivalent of VARSTOCASES. The output data frame woulda have 6 columns: two for Person and Age, three for the collapsed A and B columns and one indicating if the row reffers to A or B data.

I've meddled with melt function from {reshape2} package, but it seems it can only summarize the data into a single column instead of three. My only solution would be create two dataframes, first with A columns, second with B and the rbind() them together. However, I need to achieve this for a very complex data set and I'm hoping for an easier and more elegant solution.

The expected result is:

  Person Age Index Col_1 Col_2 Col_3
1 Andrew  25     A     1     5     4
2   John  34     A     5     0     4
3   Mike  21     A     7     9     1
4 Andrew  25     B    16    25    43
5   John  34     B    55    14    64
6   Mike  21     B    37    39    31
BogdanC
  • 1,316
  • 3
  • 16
  • 36

2 Answers2

4

Using the devel version of data.table i.e. v1.9.5, this could be done with melt. You can install it by following these instructions.

  library(data.table) #data.table_1.9.5
  dM <- melt(setDT(df), measure=list(c(3,6), c(4,7), c(5,8)), 
         value.name=c('Col_1', 'Col_2', 'Col_3'),
          variable.name='Index')[, Index:=LETTERS[Index]][]
  dM
  #   Person Age Index Col_1 Col_2 Col_3
  #1: Andrew  25     A     1     5     4
  #2:   John  34     A     5     0     4
  #3:   Mike  21     A     7     9     1
  #4: Andrew  25     B    16    25    43
  #5:   John  34     B    55    14    64
  #6:   Mike  21     B    37    39    31

Or you could use merged.stack from splitstackshape

  library(splitstackshape)
  library(data.table)
  setnames(merged.stack(df, var.stubs=c('1', '2', '3'), 
      sep='var.stubs', atStart=FALSE)[, .time_1:= substr(.time_1,4,4)], 
     3:6, c('Index', paste0("Col_",1:3)))[order(Index)]
 #   Person Age Index Col_1 Col_2 Col_3
 #1: Andrew  25     A     1     5     4
 #2:   John  34     A     5     0     4
 #3:   Mike  21     A     7     9     1
 #4: Andrew  25     B    16    25    43
 #5:   John  34     B    55    14    64
 #6:   Mike  21     B    37    39    31
akrun
  • 874,273
  • 37
  • 540
  • 662
2

reshape function is good here.

reshape(df, varying=list(c(3,6), c(4,7), c(5,8)), 
            times=c("A","B"), v.names=paste0("Col_",1:3), direction="long")

data

df <- 
structure(list(Person = structure(1:3, .Label = c("Andrew", "John", 
"Mike"), class = "factor"), Age = c(25, 34, 21), ColA_1 = c(1, 
5, 7), ColA_2 = c(5, 0, 9), ColA_3 = c(4, 4, 1), ColB_1 = c(16, 
55, 37), ColB_2 = c(25, 14, 39), ColB_3 = c(43, 64, 31)), .Names = c("Person", 
"Age", "ColA_1", "ColA_2", "ColA_3", "ColB_1", "ColB_2", "ColB_3"
), row.names = c(NA, -3L), class = "data.frame")
user20650
  • 24,654
  • 5
  • 56
  • 91