0

Good afternoon, I was hoping somebody could help me out with the following. My data looks like:

data <- read.table(text="
Level Score ID Date
1 30 A 5/10/2019        
2 50 A 5/11/2019
2 60 B 26/10/2019
4 90 B 26/11/2019
3 70 B 10/02/2020
4 92 C 29/09/2019
2 51 D 19/02/2020
4 95 D 19/05/2020
", header= TRUE)

I'd like to change my format from long to wide, based on the column 'ID' and 'Date'. I'd like to end up with the following data frame format:

data_wide <- read.table(text="
ID  Level_Time_Pre Level_Time_Post Level_Time_M3 Score_Time_Pre Score_Time_Post Score_Time_M3 
A 1 2 NA 30 50 NA   
B 2 4 3 60 90 70
C 2 NA NA 92 NA NA
D 2 NA 4 51 NA 95
", header= TRUE)

The pre-measures are always a month before the post-measures and the M3 measures are always around 3 months after the post-measures. Not everyone has all the measures, some only have pre, some only post, some only M3, others have 2 of the three and some all three...Also, the ID variable in real-life is more complicated and I have around 8500 observations (so, I cannot use, "ifelse (ID = A, ...)"). I am thinking that I first needs to create a factor variable ('Time') based on the 'Date' column before I can use the dcast function. However, I don't know how to best create this variable. I have tried: data %>% group_by(ID) %>% mutate (Time = paste("Pre", "Post", "M3")). This will give the Pre, Post, M3 in each cell of the new column...

I hope you can help me, THANKS!

  • Well, in the cases where you only have one observation for a group, how do you know whether it is pre, post or M3? I think it is impossible in this case to assign the observation to one of the three groups, since from your example, the months are not always the same (i.e., for instance pre is not always nov 2019). – SweetSpot Aug 07 '20 at 09:20
  • Good question, I have another data set where I have the same ID with the raw data of the scores. I was going to merge the composite scores with this set and move them into the right column based on where they have raw data. – Enrique Mergelsberg Aug 09 '20 at 23:18
  • I found a solution where I first widen the data and then clean it up: ` Data_wide <- reshape(Data, idvar= c('ID'), timevar='Date', direction='wide')`. This will create a new column for every date. Then I move the values to the left, deleting all the NAs: 'Data_wide_clean = as.data.frame(t(apply(Data_wide,1, function(x) { return(c(x[!is.na(x)],x[is.na(x)]) )} )))' . Next, I remove the empty columns: 'Data_wide_clean <- Data_wide_clean[, colSums(is.na(Data_wide_clean)) != nrow(Data_wide_clean)]' and rename the columns: names(Data_wide_clean[1] <- "Level_Pre". – Enrique Mergelsberg Aug 10 '20 at 00:03

0 Answers0