Calculate reliable improvement pre to post assesment from unbalanced data in long format

Question

I would like to calculate reliable improvement or worsening from session 1 to "the last session" in a unbalanced data set organized in long format.

The data i have looks like this:

ID <- c("A","A","B","B","B","C","C","C","C")
Session <-c(1,2,1,2,3,1,2,3,4)
Value <- c(10,6,25,35,15,20,25,35,35)
Have <- data.table(ID,Session,Value)
Have

ID Session Value
 A       1    10
 A       2     6 
 B       1    25
 B       2    35
 B       3    15
 C       1    20
 C       2    25
 C       3    35
 C       4    35

The data i need would look like this:

Change <- c(-4,-4,-10,-10,-10,15,15,15,15)


Need <- data.table(ID,Session, Value,Change)
Need

ID Session Value Change 
A       1    10    -4       
A       2     6    -4       
B       1    25    -10 
B       2    35    -10 
B       3    15    -10 
C       1    20    15    
C       2    25    15    
C       3    35    15    
C       4    35    15

I have tried this:

Have$change<-as.vector(unlist(tapply(Have$Value,Have$ID,FUN=function(x){return (x-rep(x[1],length(x)))})));
Have
ID Session Value change
A       1    10      0
A       2     6     -4
B       1    25      0
B       2    35     10
B       3    15    -10
C       1    20      0
C       2    25      5
C       3    35     15
C       4    35     15

I used code from this post Calculating change from baseline with data in long format

score 0 · Accepted Answer · edited May 23 '17 at 10:28

0

Not the prettiest code but does what you want it to do I think. I don't really know data.table so used dplyr instead. I also got a little bit of help on how to select first and last from this question https://stackoverflow.com/a/31529043/4651564

library(dplyr)

Have <- as.data.frame(Have)

Have2 <- Have %>% 
    group_by(ID) %>% 
    arrange(Session) %>% 
    filter(row_number() %in% c( 1 , n() ))  %>% 
    summarise( change =  diff(Value)  )


Have %>%  left_join (Have2  , by = "ID")

edit: Updated my code a bit to simplify it

edited May 23 '17 at 10:28

Community

1
1

answered Jun 30 '16 at 12:28

gowerc

1,039
9
18

Hi thanks for your reply! The code works fine in the example above. I have tried it on my original data set but it returns "Error: expecting a single value"...The original data is a data.frame as in the example. I have changed the ID variable in my data set from factor to character (as in the example) but I still get the error. The ID variable looks like for example "BNCS01", Session is numerical (same at in the example but ranges from 1 to 29) and Value is a numerical ranges from 0-40 (with 3 decimals). Any suggestions on why I get the error message? – Carl Jul 01 '16 at 08:27
Hi @Carl, I would guess that this issue is that one of your groups only has 1 observation in it. I didn't take into account that if only 1 observation is passed to `diff` it returns a vector of length 0 where dplyr is expecting a vector of length 1. To solve this you can simply write and use your own diff function this also gives the added benefit of giving you more control over how to handle outlying or missing values. You could for example use the following `DIFF <- function(x){ ### Remove NAs x <- x[!is.na(x)] if (length(x) == 0 ) return(NA) else return( max(x) - min(x) ) }` – gowerc Jul 03 '16 at 16:00

Calculate reliable improvement pre to post assesment from unbalanced data in long format

1 Answers1