How to create a rank for a variable in a longitudinal dataset based on a condition?

Question

I have a longitudinal dataset where each subject is represented more than once. One represents one admission for a patient. Each admission, regardless of the subject also has a unique "key". I need to figure out which admission is the "INDEX" admission, that is, the first admission, so that I know that which rows are the subsequent RE-admission. The variable to use is "Daystoevent"; the lowest number represents the INDEX admission. I want to create a new variable based on the condition that for each subject, the lowest number in the variable "Daystoevent" is the "index" admission and each subsequent gets a number "1" , "2" etc. I want to do this WITHOUT changing into the horizontal format.

The dataset looks like this:

Subject Daystoevent Key A 5 rtwe A 8 erer B 3 tter B 8 qgfb A 2 sada C 4 ccfw D 7 mjhr B 4 sdfw C 1 srtg C 2 xcvs D 3 muyg

Would appreciate some help.

can you provide an example of what the correct output would look like on this sample data? Thanks :) — mysteRious, Jul 29 '18 at 04:21

score 0 · Answer 1 · answered Jul 29 '18 at 05:12

This may not be an elegant solution but will do the job:

library(dplyr)

df <- df %>%
  group_by(Subject) %>%
  arrange(Subject, Daystoevent) %>%
  mutate(
    Admission = if_else(Daystoevent == min(Daystoevent), 0, 1),
  ) %>%
  ungroup()

for(i in 1:(nrow(df) - 1)) {
  if(df$Admission[i] == 1) {
    df$Admission[i + 1] <- 2
  } else if(df$Admission[i + 1] != 0){
    df$Admission[i + 1] <- df$Admission[i] + 1
  }
}

df[df == 0] <- "index"

df
# # A tibble: 11 x 4
#    Subject Daystoevent Key   Admission
#    <chr>         <dbl> <chr> <chr>    
#  1 A                 2 sada  index    
#  2 A                 5 rtwe  1        
#  3 A                 8 erer  2        
#  4 B                 3 tter  index    
#  5 B                 4 sdfw  1        
#  6 B                 8 qgfb  2        
#  7 C                 1 srtg  index    
#  8 C                 2 xcvs  1        
#  9 C                 4 ccfw  2        
# 10 D                 3 muyg  index    
# 11 D                 7 mjhr  1

Data:

df <- data_frame(
  Subject = c("A", "A", "B", "B", "A", "C", "D", "B", "C", "C", "D"),
  Daystoevent = c(5, 8, 3, 8, 2, 4, 7, 4, 1, 2, 3),
  Key = c("rtwe", "erer", "tter", "qgfb", "sada", "ccfw", "mjhr", "sdfw", "srtg", "xcvs", "muyg")
)

thanks a lot. The first part seems to work well, but for 2nd part, i am guessing the script you wrote only works if the maximum number of times a patient got readmitted was twice (i.e. max(Admission)=2), as in the example i gave? If its not 2 and lets say 10, then ? Sorry if this is too convoluted. New to this. — Mohammed Ali Alvi, Jul 29 '18 at 09:44
@MohammedAliAlvi; I don't believe that's the case here. This code is flexible enough to handle additional admissions. There should be another problem with the original dataset and the example is not representative of it. Please make your question [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), preferably using `dput`. — OzanStats, Jul 29 '18 at 14:03

How to create a rank for a variable in a longitudinal dataset based on a condition?

1 Answers1