1

I have a dataset that is arranged like this:

ID   A   B  C   D   Win  Loss
001  NA  3  NA  NA  6    NA  
002  NA  NA NA  NA  NA   17
003  1   5  12  18  NA   22
004  NA  7  9   NA  31   NA
005  8   2  NA  NA  NA   14
006  2   6  12  19  25   NA
007  NA  NA NA  NA  6    NA 

In this dataset, IDs are supposed to move through each stage (A, B, C, D) chronologically and hit either Win/Loss at the end (ID 003 and 006)

Sometimes, however, IDs move backwards (ID 005), others skip stages (ID 001 and 004) and some go directly to Win/Loss (ID 002 and 007).

I would like to call these out with dplyr mutate logic. Output:

ID   A   B  C   D   Win  Loss  Backwards Skip  Just W/L
001  NA  3  NA  NA  6    NA    F         T     F
002  NA  NA NA  NA  NA   17    F         T     T
003  1   5  12  18  NA   22    F         F     F
004  NA  7  9   NA  31   NA    F         T     F
005  8   2  NA  NA  NA   14    T         T     F
006  2   6  12  19  25   NA    F         F     F
007  NA  NA NA  NA  6    NA    F         T     T

I know that I should be using logic similar to this, but I I just can't figure it out.

Thanks in advance.

EDIT:

Bonus points if you can also tell me how to count the time/days elapsed between each stage even if it skips a couple.

Community
  • 1
  • 1
emehex
  • 9,874
  • 10
  • 54
  • 100

1 Answers1

3

This could be done using base R. Select the columns of interest df[LETTERS[1:4]], create a logical matrix of presence/absence of "NAs" (is.na(df[...)). Negate it (!) so that non-NA elements become "TRUE", get the row wise sum rowSums and negate it ! so the rows with 0 non-NA values will become TRUE, and that would be the case with just a win or loss. For the "backwards", we can use a loop method (apply() for rows (MARGIN=1), check whether the difference of non-NA elements (diff(na.omit(x))) have any negative number (any(....)<0). If there is one, it indicates, IDs moved backwards. The "Skip" is also similar to "JustWL" where we get the logical matrix (is.na(..)), sum by row (rowSums) and double negate (!!). If there is at least one "NA", this will be TRUE.

JustWL <- !rowSums(!is.na(df[LETTERS[1:4]]))
Backwards <- apply(df[LETTERS[1:4]], 1, function(x) any(diff(na.omit(x))<0))
Skip <- !!rowSums(is.na(df[LETTERS[1:4]]))

df1 <- data.frame(df, JustWL, Backwards, Skip)
df1
#   ID  A  B  C  D Win Loss JustWL Backwards  Skip
# 1  1 NA  3 NA NA   6   NA  FALSE     FALSE  TRUE
# 2  2 NA NA NA NA  NA   17   TRUE     FALSE  TRUE 
# 3  3  1  5 12 18  NA   22  FALSE     FALSE FALSE
# 4  4 NA  7  9 NA  31   NA  FALSE     FALSE  TRUE
# 5  5  8  2 NA NA  NA   14  FALSE      TRUE  TRUE
# 6  6  2  6 12 19  25   NA  FALSE     FALSE FALSE
# 7  7 NA NA NA NA   6   NA   TRUE     FALSE  TRUE

Or using the same code in dplyr

library(dplyr)
 df %>% 
    mutate(JustWL=!rowSums(!is.na(.[LETTERS[1:4]])), 
           Skip=!!rowSums(is.na(.[LETTERS[1:4]]))) %>%
           rowwise() %>% 
           do(data.frame(., Backwards= 
               any(diff(na.omit(unlist(.[LETTERS[1:4]])))<0)))
akrun
  • 874,273
  • 37
  • 540
  • 662