1

I've been reading through the documentation on the reshape2 package in R and other packages for restructuring data from the wide to long formats and vice versa. However, I'm stuck on this problem because I need to create a new variable for time and group my hormone measurements by time. I previously did this in SPSS, though I am making the switch to R for obvious reasons like many others. I know there must be an easy way to do this in R, but I'm having trouble figuring this out.

The data from this project is from a longitudinal clinical study in which 20 different hormones were measured at 5 different time points for each patient (example made-up data is below - a1 is hormone 'a' at visit 1, a2 is hormone 'a' at visit 2 and so on). There are 20 patients total in the study, each with unique identifiers in the spreadsheet (id). The hormone data (hormone 'a', hormone 'b', etc) is arranged as follows in wide form in my spreadsheet:

> id     a1     a2     a3     a4     a5    b1     b2     b3     b4     b5...
>  1     21     50     28     19     15    24     90     40     35     20...
>  2     23     45     15     22     20    25     45     34     31     22...
>  3     29     88     33     32     21    78     32     33     45     21...
...

When I previously did this in SPSS, the software prompts me for id variable as well as variable names to collapse the longitudinal measurements into. I would create a new variable called "visit" which is from 1 to 5, which corresponded to the 5 measurements I have on each hormone. When I did that in SPSS it creates a new output in the long format that looks like this:

> id     visit      a     b
> 1          1     21    24
> 1          2     50    90
> 1          3     28    40
> 1          4     19    35
> 1          5     15    20
> 2          1     23    25
> 2          2     45    45
> 2          3     15    34
> 2          4     22    31
> 2          5     20    22
> 3          1     29    78
...

I've tried using reshape, and the function appears to work but when I look at the actual data the numbers are getting mixed up between the wide and long formats. I must be doing something very basic wrong, but I having difficulty figuring it out.

d_long <- reshape(d, varying = c("a1", "a2", "a3", "a4", "a5", 
    "b1", "b2", "b3", "b4", "b5"), v.names = c("a", "b"), 
    idvar = "id", times = c(1:5), direction = "long")
Vance L Albaugh
  • 115
  • 1
  • 6

3 Answers3

0

I think you should use the reshape() function and specify the new time variable with the argument timevar.

JonB
  • 156
  • 4
0

This could be easily done with the melt from the devel version of data.table. Instructions to install the devel version is here.

We convert the 'data.frame' to 'data.table' (setDT(df1)). Then, we specify the patterns in the measure argument of melt to match the multiple measure columns.

library(data.table)
dM <- melt(setDT(df1), measure=patterns(c('^a\\d+', '^b\\d+')),
          value.name=c('a', 'b'), variable.name='visit')[order(id)]
dM
#    id visit  a  b
# 1:  1     1 21 24
# 2:  1     2 50 90
# 3:  1     3 28 40
# 4:  1     4 19 35
# 5:  1     5 15 20
# 6:  2     1 23 25
# 7:  2     2 45 45
# 8:  2     3 15 34
# 9:  2     4 22 31
#10:  2     5 20 22
#11:  3     1 29 78
#12:  3     2 88 32
#13:  3     3 33 33
#14:  3     4 32 45
#15:  3     5 21 21

data

df1 <- structure(list(id = 1:3, a1 = c(21L, 23L, 29L), a2 = c(50L, 45L, 
88L), a3 = c(28L, 15L, 33L), a4 = c(19L, 22L, 32L), a5 = c(15L, 
20L, 21L), b1 = c(24L, 25L, 78L), b2 = c(90L, 45L, 32L), b3 = c(40L, 
34L, 33L), b4 = c(35L, 31L, 45L), b5 = c(20L, 22L, 21L)), .Names = c("id", 
"a1", "a2", "a3", "a4", "a5", "b1", "b2", "b3", "b4", "b5"),
class = "data.frame", row.names = c(NA, -3L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I'm not sure what you mean by the devel version? I'm relatively new to R and still trying to figure this all out – Vance L Albaugh Aug 18 '15 at 17:30
  • akrun - I can't access the help documents (?corrupted) and I can't figure out how to take that code and adapt it to my needs that are 20 variables at 5 time points each. Can you tell me how the "measure=patters" argument should say generally speaking? – Vance L Albaugh Aug 18 '15 at 20:02
  • @VanceAlbaugh It means the version that is under development. If you have looked at the link, it gives instructions to install it. Once it is installed, load the package (`library(data.table)`). Regarding the second comment. In the example, you had column names that start with either `a` followed by numbers (`^a\\d+`) or `b` followed by numbers ('^b\\d+`). This is given in the pattern. So, in general, if there are more columns say with `c` or `d` that start it can be also placed in the patterns. – akrun Aug 19 '15 at 02:40
  • Thanks Akrun - that's very helpful! – Vance L Albaugh Aug 19 '15 at 12:44
0

Here is a good summary about converting data between long and wide format.

Use R and SAS to reshape the data format: long to wide and wide to long

SixSigma
  • 2,808
  • 2
  • 18
  • 21