I've been reading through the documentation on the reshape2
package in R and other packages for restructuring data from the wide to long formats and vice versa. However, I'm stuck on this problem because I need to create a new variable for time and group my hormone measurements by time. I previously did this in SPSS, though I am making the switch to R
for obvious reasons like many others. I know there must be an easy way to do this in R
, but I'm having trouble figuring this out.
The data from this project is from a longitudinal clinical study in which 20 different hormones were measured at 5 different time points for each patient (example made-up data is below - a1 is hormone 'a' at visit 1, a2 is hormone 'a' at visit 2 and so on). There are 20 patients total in the study, each with unique identifiers in the spreadsheet (id). The hormone data (hormone 'a', hormone 'b', etc) is arranged as follows in wide form in my spreadsheet:
> id a1 a2 a3 a4 a5 b1 b2 b3 b4 b5...
> 1 21 50 28 19 15 24 90 40 35 20...
> 2 23 45 15 22 20 25 45 34 31 22...
> 3 29 88 33 32 21 78 32 33 45 21...
...
When I previously did this in SPSS, the software prompts me for id variable as well as variable names to collapse the longitudinal measurements into. I would create a new variable called "visit" which is from 1 to 5, which corresponded to the 5 measurements I have on each hormone. When I did that in SPSS it creates a new output in the long format that looks like this:
> id visit a b
> 1 1 21 24
> 1 2 50 90
> 1 3 28 40
> 1 4 19 35
> 1 5 15 20
> 2 1 23 25
> 2 2 45 45
> 2 3 15 34
> 2 4 22 31
> 2 5 20 22
> 3 1 29 78
...
I've tried using reshape, and the function appears to work but when I look at the actual data the numbers are getting mixed up between the wide and long formats. I must be doing something very basic wrong, but I having difficulty figuring it out.
d_long <- reshape(d, varying = c("a1", "a2", "a3", "a4", "a5",
"b1", "b2", "b3", "b4", "b5"), v.names = c("a", "b"),
idvar = "id", times = c(1:5), direction = "long")