I'm trying to reshape()
some time-varying data in R. I am working with the following dataset:
dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")
These are time-varying data from a longitudinal study, and a subset of a much larger dataset that I'm importing from source files. I'd like to extract the values pop
, coe
and rcb
for both the baseline
and final
study visits (in my complete dataset there are several visits in between, which I've omitted for the purposes of this question).
I can do the following:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')
However, this ends up with the values that should be in pop
being labelled as coe
. The documentation for reshape2
tells me I should explicitly reference the varying
values to avoid 'guessing'. So, I try this instead:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')
This results in exactly the same output, despite naming the varying
argument explicitly. What am I doing wrong? Presumably, pop
ends up with coe
's values due to alphabetisation, but I can't understand why this has happened since I have now declared the varying
argument explicitly...
EDIT: The expected output would be as follows:
participant_id time pop coe rcb
FDVCZX 1 6 11.19 16.74
ADSCXZ 1 6 13.6 25
AESFDZC 1 7 3.96 25
ZXCV 1 6 7.64 18.37
AGS 1 6 6.12 25
AGSFV 1 6 6.92 25
FDVCZX 2 NA NA NA
ADSCXZ 2 NA NA NA
AESFDZC 2 7.1 5.926362 25
ZXCV 2 8 4.89 NA
AGS 2 6 11.98 NA
AGSFV 2 NA NA NA
However, as you will see, the pop
values end up in the coe
column, and vice versa.