1

I'm trying to reshape() some time-varying data in R. I am working with the following dataset:

dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")

These are time-varying data from a longitudinal study, and a subset of a much larger dataset that I'm importing from source files. I'd like to extract the values pop, coe and rcb for both the baseline and final study visits (in my complete dataset there are several visits in between, which I've omitted for the purposes of this question).

I can do the following:

reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')

However, this ends up with the values that should be in pop being labelled as coe. The documentation for reshape2 tells me I should explicitly reference the varying values to avoid 'guessing'. So, I try this instead:

reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')

This results in exactly the same output, despite naming the varying argument explicitly. What am I doing wrong? Presumably, pop ends up with coe's values due to alphabetisation, but I can't understand why this has happened since I have now declared the varying argument explicitly...

EDIT: The expected output would be as follows:

participant_id  time    pop coe         rcb
FDVCZX          1       6   11.19       16.74
ADSCXZ          1       6   13.6        25
AESFDZC         1       7   3.96        25
ZXCV            1       6   7.64        18.37
AGS             1       6   6.12        25
AGSFV           1       6   6.92        25
FDVCZX          2       NA  NA          NA
ADSCXZ          2       NA  NA          NA
AESFDZC         2       7.1 5.926362    25
ZXCV            2       8   4.89        NA
AGS             2       6   11.98       NA
AGSFV           2       NA  NA          NA

However, as you will see, the pop values end up in the coe column, and vice versa.

CaptainProg
  • 5,610
  • 23
  • 71
  • 116

1 Answers1

0

We can use melt from data.table, which can take multiple measure columns.

library(data.table)
melt(setDT(dframe), measure=patterns('pop', 'coe', 'rcb'), 
     value.name = c('pop', 'coe', 'rcb'), variable.name='time')
#    participant_id time pop       coe   rcb
# 1:         FDVCZX    1 6.0 11.190000 16.74
# 2:         ADSCXZ    1 6.0 13.600000 25.00
# 3:        AESFDZC    1 7.0  3.960000 25.00
# 4:           ZXCV    1 6.0  7.640000 18.37
# 5:            AGS    1 6.0  6.120000 25.00
# 6:          AGSFV    1 6.0  6.920000 25.00
# 7:         FDVCZX    2  NA        NA    NA
# 8:         ADSCXZ    2  NA        NA    NA
# 9:        AESFDZC    2 7.1  5.926362 25.00
#10:           ZXCV    2 8.0  4.890000    NA
#11:            AGS    2 6.0 11.980000    NA
#12:          AGSFV    2  NA        NA    NA
akrun
  • 874,273
  • 37
  • 540
  • 662