0

I have a dataset with daily measures of the same blood level for the same subjects. all patients are "positive" to begin with, most change to negative at some point. some remain negative till the end of the experiment while others turn positive after negative.

I am trying to identify those who turn positive after negative.

Tried ifelse and tried lag function in dplyr but could not get anywhere.

here is an example of how my data looks like:

subject      day1      day2      day3      day4      day5      day6      day7
1       A positive  positive  positive  positive  positive  positive  positive 
2       B positive  positive   negative positive   negative  negative  negative
3       C positive  positive  positive   negative  negative positive  positive 
4       D positive  positive  positive   negative  negative  negative  negative
jaco0646
  • 15,303
  • 7
  • 59
  • 83
Bahi8482
  • 489
  • 5
  • 15

3 Answers3

1

You can get the data in long format and for each subject return TRUE if any value turn to "positive" after being "negative".

library(dplyr)
df %>%
  tidyr::pivot_longer(cols = -subject) %>%
  group_by(subject) %>%
  summarise(pos_aft_neg = any(value == 'positive' & 
                              lag(value) == 'negative', na.rm = TRUE)) %>%
  left_join(df, 'subject')


# A tibble: 4 x 9
#  subject pos_aft_neg day1     day2     day3     day4     day5     day6     day7    
#  <chr>   <lgl>       <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
#1 A       FALSE       positive positive positive positive positive positive positive
#2 B       TRUE        positive positive negative positive negative negative negative
#3 C       TRUE        positive positive positive negative negative positive positive
#4 D       FALSE       positive positive positive negative negative negative negative

In base R, you can use apply row-wise :

df$pos_aft_neg <- apply(df, 1, function(x) 
                      any(x[-1] == 'positive' & x[- length(x)] == 'negative'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thank you. the apply function worked great. the dplyr gave this error: ERROR: Can't combine `day1` and `pos_aft_neg` . any suggestion? – Bahi8482 Jun 05 '20 at 06:31
  • Let me guess. Because you ran `apply` answer first and added a new column `pos_aft_neg` which is giving you the error. Try running on the original data without `pos_aft_neg` column. – Ronak Shah Jun 05 '20 at 06:35
0

This also gives you the value of the day they turned positive after negative:

library(dplyr)
df %>% 
  pivot_longer(day1:day7) %>% 
  group_by(subject) %>% 
  mutate(value_day_before = lag(value)) %>% 
  filter(value == "positive" & lag(value) == "negative")

# A tibble: 2 x 4
# Groups:   subject [2]
  subject name  value    value_day_before
  <chr>   <chr> <chr>    <chr>           
1 B       day4  positive negative        
2 C       day6  positive negative
Ahorn
  • 3,686
  • 1
  • 10
  • 17
0

a data.table approach

sample data

library( data.table )
DT <- fread("subject      day1      day2      day3      day4      day5      day6      day7
       A positive  positive  positive  positive  positive  positive  positive 
       B positive  positive   negative positive   negative  negative  negative
       C positive  positive  positive   negative  negative positive  positive 
       D positive  positive  positive   negative  negative  negative  negative")

method 1. create marker column for pos-after-neg

#melt to long format
DT.long <- melt( DT, "subject" )
#get pos-after-neg by subject, create marker-column
DT.long[ DT.long[, .I[ value == "positive" & shift( value, fill = "positive", type = "lag" ) == "negative" ], by = subject ]$V1, marker := 1]

#     subject variable    value marker
#  1:       A     day1 positive     NA
#  2:       B     day1 positive     NA
#  3:       C     day1 positive     NA
#  4:       D     day1 positive     NA
#  5:       A     day2 positive     NA
#  6:       B     day2 positive     NA
#  7:       C     day2 positive     NA
#  8:       D     day2 positive     NA
#  9:       A     day3 positive     NA
# 10:       B     day3 negative     NA
# 11:       C     day3 positive     NA
# 12:       D     day3 positive     NA
# 13:       A     day4 positive     NA
# 14:       B     day4 positive      1
# 15:       C     day4 negative     NA
# 16:       D     day4 negative     NA
# 17:       A     day5 positive     NA
# 18:       B     day5 negative     NA
# 19:       C     day5 negative     NA
# 20:       D     day5 negative     NA
# 21:       A     day6 positive     NA
# 22:       B     day6 negative     NA
# 23:       C     day6 positive      1
# 24:       D     day6 negative     NA
# 25:       A     day7 positive     NA
# 26:       B     day7 negative     NA
# 27:       C     day7 positive     NA
# 28:       D     day7 negative     NA
#     subject variable    value marker

method 2. filter the relevant rows

#melt to long format
DT.long <- melt( DT, "subject" )
#get pos-after-neg by subject, create marker-column
DT.long[ DT.long[, .I[ value == "positive" & shift( value, fill = "positive", type = "lag" ) == "negative" ], by = subject ]$V1, ][]

#    subject variable    value
# 1:       B     day4 positive
# 2:       C     day6 positive
Wimpel
  • 26,031
  • 1
  • 20
  • 37