Reshaping a data frame with more than one measure variable

Question

I'm using a data frame similar to this one:

df<-data.frame(student=c(rep(1,5),rep(2,5)), month=c(1:5,1:5),  
      quiz1p1=seq(20,20.9,0.1),quiz1p2=seq(30,30.9,0.1),  
      quiz2p1=seq(80,80.9,0.1),quiz2p2=seq(90,90.9,0.1))      

print(df)  

   student month quiz1p1 quiz1p2 quiz2p1 quiz2p2  
1     1     1    20.0    30.0    80.0    90.0  
2     1     2    20.1    30.1    80.1    90.1  
3     1     3    20.2    30.2    80.2    90.2  
4     1     4    20.3    30.3    80.3    90.3
5     1     5    20.4    30.4    80.4    90.4
6     2     1    20.5    30.5    80.5    90.5
7     2     2    20.6    30.6    80.6    90.6
8     2     3    20.7    30.7    80.7    90.7
9     2     4    20.8    30.8    80.8    90.8
10    2     5    20.9    30.9    80.9    90.9

Describing grades received by students during five months – in two quizzes divided into two parts each.

I need to get the two quizzes into separate rows – so that each student in each month will have two rows, one for each quiz, and two columns – for each part of the quiz. When I melt the table:

melt.data.frame(df, c("student", "month"))

I get the two parts of the quiz in separate lines too.

dcast(dfL,student+month~variable)

of course gets me right back where I started, and I can't find a way to cast the table back in to the required form. Is there a way to make the melt command function something like:

melt.data.frame(df, measure.var1=c("quiz1p1","quiz2p1"), 
                measure.var2=c("quiz1p2","quiz2p2"))

Sample data, clear question. +1. Nice work for a person asking first question. Welcome to SO. — Roman Luštrik, Oct 11 '12 at 12:09
Related question: http://stackoverflow.com/questions/27247078/reshape-multiple-values-at-once-in-r — landroni, Oct 15 '15 at 16:26

Josh O'Brien · Accepted Answer · 2012-10-11T13:44:19.560

12

Here's how you could do this with reshape(), from base R:

df2 <- reshape(df, direction="long",
               idvar = 1:2, varying = list(c(3,5), c(4,6)),
               v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))

## Checking the output    
rbind(head(df2, 3), tail(df2, 3))
#           student month  time   p1   p2
# 1.1.quiz1       1     1 quiz1 20.0 30.0
# 1.2.quiz1       1     2 quiz1 20.1 30.1
# 1.3.quiz1       1     3 quiz1 20.2 30.2
# 2.3.quiz2       2     3 quiz2 80.7 90.7
# 2.4.quiz2       2     4 quiz2 80.8 90.8
# 2.5.quiz2       2     5 quiz2 80.9 90.9

You can also use column names (instead of column numbers) for idvar and varying. It's more verbose, but seems like better practice to me:

## The same operation as above, using just column *names*
df2 <- reshape(df, direction="long", idvar=c("student", "month"),
               varying = list(c("quiz1p1", "quiz2p1"), 
                              c("quiz1p2", "quiz2p2")), 
               v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))

edited Oct 11 '12 at 13:44

answered Oct 11 '12 at 13:35

Josh O'Brien

159,210
26
366
455

1

Thanks for that answer. Nice illustration of the use of 'v.names' and 'times'. – IRTFM Oct 11 '12 at 18:37
2

@DWin -- Sure 'nuff. I think you and I are the main proponents for plain old `reshape()` around these parts. (I can't think of any R function with a more opaque man page (or less helpful examples), so learning to use it entails a really steep learning curve.) – Josh O'Brien Oct 11 '12 at 18:45
2

Exactly. I think this problem and answer would be a good addition to the Examples section for the help page. – IRTFM Oct 11 '12 at 19:03
I've thought before of working up an alternative Examples section for `reshape()`. Is there any realistic route for something like that making it's way into a base package? – Josh O'Brien Oct 11 '12 at 19:06
Every time there is a complaint about its help page, somebody from RCore says they would welcome improvements. You are not asking to change base-R, only improve its 'doco'. So submitting to r-devel and see what reception you get. – IRTFM Oct 11 '12 at 19:11
@DWin -- Cool. I'll take a look at it, and then post suggested code (with rationale for the changes) to R-devel. Will start by scanning through `package:datasets` for datasets that are smaller or more appropriate than `Indometh` or `state.x77`. – Josh O'Brien Oct 11 '12 at 19:16
I got some very nice work-around solutions to the question but I thougt there should be a built-in solution - thanks! also thanks for preparing the two versions - very helpful in understanding how this function works. – eli-k Oct 12 '12 at 06:49
One more question though - I would have thought melt and cast, coming from more advanced packages, should be able to do this too? – eli-k Oct 12 '12 at 06:56
1

@eli-k -- It might be better to think of Hadley's packages as more user-friendly rather than more advanced. The R core team includes some pretty impressive and experienced programmers, and a function like `reshape()` has been around long enough that it's become pretty well battle-hardened. So, in a number of ways that count, core R included much of the most 'advanced' code in the R universe. (By the way, glad you appreciated this alternative solution.) – Josh O'Brien Oct 12 '12 at 07:06
@Josh O'Brien, this info is becoming very useful in choosing between a variety of available functions in different packages. thanks for your help! – eli-k Oct 16 '12 at 15:16

score 7 · Answer 2 · answered Oct 11 '12 at 11:52

7

I think this does what you want:

#Break variable into two columns, one for the quiz and one for the part of the quiz
dfL <- transform(dfL, quiz = substr(variable, 1,5), 
                 part = substr(variable, 6,7))

#Adjust your dcast call:
dcast(dfL, student + month + quiz ~ part)
#-----
   student month  quiz   p1   p2
1        1     1 quiz1 20.0 30.0
2        1     1 quiz2 80.0 90.0
3        1     2 quiz1 20.1 30.1
...
18       2     4 quiz2 80.8 90.8
19       2     5 quiz1 20.9 30.9
20       2     5 quiz2 80.9 90.9

answered Oct 11 '12 at 11:52

Chase

67,710
18
144
161

Thanks for this great solution @Chase. although I prefer the built-in solution in general, your solution looks like it would require less code in more complex df's. For example, if each quiz was devided into six parts, I wouldn't have to add anything to your code while I would have to write six pairs of column names in the reshape function. – eli-k Oct 12 '12 at 07:18

score 3 · Answer 3 · edited May 23 '17 at 11:53

There was a very similar question asked about half a year ago, in which I wrote the following function:

melt.wide = function(data, id.vars, new.names) {
  require(reshape2)
  require(stringr)
  data.melt = melt(data, id.vars=id.vars)
  new.vars = data.frame(do.call(
    rbind, str_extract_all(data.melt$variable, "[0-9]+")))
  names(new.vars) = new.names
  cbind(data.melt, new.vars)
}

You can use the function to "melt" your data as follows:

dfL <-melt.wide(df, id.vars=1:2, new.names=c("Quiz", "Part"))
head(dfL)
#   student month variable value Quiz Part
# 1       1     1  quiz1p1  20.0    1    1
# 2       1     2  quiz1p1  20.1    1    1
# 3       1     3  quiz1p1  20.2    1    1
# 4       1     4  quiz1p1  20.3    1    1
# 5       1     5  quiz1p1  20.4    1    1
# 6       2     1  quiz1p1  20.5    1    1
tail(dfL)
#    student month variable value Quiz Part
# 35       1     5  quiz2p2  90.4    2    2
# 36       2     1  quiz2p2  90.5    2    2
# 37       2     2  quiz2p2  90.6    2    2
# 38       2     3  quiz2p2  90.7    2    2
# 39       2     4  quiz2p2  90.8    2    2
# 40       2     5  quiz2p2  90.9    2    2

Once the data are in this form, you can much more easily use dcast() to get whatever form you desire. For example

head(dcast(dfL, student + month + Quiz ~ Part))
#   student month Quiz    1    2
# 1       1     1    1 20.0 30.0
# 2       1     1    2 80.0 90.0
# 3       1     2    1 20.1 30.1
# 4       1     2    2 80.1 90.1
# 5       1     3    1 20.2 30.2
# 6       1     3    2 80.2 90.2

Thanks for suggesting this solution @mrdwab. It took me a while to understand how it's supposed to work, but now that I get it I can see how your function and your general approach to the problem can be useful in this situation and in other situations too. — eli-k, Oct 12 '12 at 07:39
@eli-k, don't forget that for most functions, you can simply write the function name at the console (eg `> reshape`) to see the code that powers them. Then, you can run different parts of the function, seeing what is done at each step. That can be a useful way to learn some fun coding tricks. — A5C1D2H2I1M1N2O1R2T1, Oct 12 '12 at 08:22

Reshaping a data frame with more than one measure variable

3 Answers3

Linked