5

I need to find the minimum values of three columns that are bigger than the values in another column. Say these five individuals entered a hospital in different months of the year, and they suffered several heart attacks before and after hospitalization. I need the first heart attack after hospitalization.


id<-c(100,105,108,200,205)
hosp<-c(3,5,2,6,2)
attack1<-c(1,6,3,4,1)
attack2<-c(4,7,9,10,NA)
attack3<-c(5,10,NA,NA,NA)
out<-c(7,12,11,12,9)

data <- data.frame(id,hosp,attack1,attack2,attack3,out)

   id hosp attack1 attack2 attack3 out
1 100    3       1       4       5   7
2 105    5       6       7      10  12
3 108    2       3       9      NA  11
4 200    6       4      10      NA  12
5 205    2       1      NA      NA   9

So the data should end up looking something like

   id hosp attack1 attack2 attack3 out afterh
1 100    3       1       4       5   7      4
2 105    5       6       7      10  12      6
3 108    2       3       9      NA  11      3
4 200    6       4      10      NA  12     10
5 205    2       1      NA      NA   9     NA

This is my attempt which did not work:

min_f<-function(a){
  x<-min(a[a>hosp])
}

data %>% mutate_if(vars(attack1,attack2,attack3),min_f())
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
elliezee
  • 113
  • 4

3 Answers3

4

You can use the following solution.

  • Here c(...) refers to all variables in each row of your data set and I chose only those that starts with attack
  • Then I chose only those values that are greater than the corresponding value of hosp in each row and since you were looking for the first one that is greater than the value of hosp I used first function to extract that
  • ..2 also refers to the value of the second variable hosp in each row
library(dplyr)
library(purrr)

data %>%
  mutate(afterh = pmap_dbl(., ~ {x <- c(...)[3:5]; 
  first(sort(x[x > ..2]))}))

   id hosp attack1 attack2 attack3 out afterh
1 100    3       1       4       5   7      4
2 105    5       6       7      10  12      6
3 108    2       3       9      NA  11      3
4 200    6       4      10      NA  12     10
5 205    2       1      NA      NA   9     NA

As an alternative as mentioned by dear Mr. @Greg in a very large data set, we can use min function in place of first(sort)) combination to ensure a faster evaluation time of the following solution. In case there is no value greater than hosp like in the last row min function would return Inf so I made sure that it would return the value 0 instead you can change it with the value you prefer:

data %>%
  mutate(afterh = pmap_dbl(., ~ {x <- c(...)[3:5];
  out <- min(x[x > ..2], na.rm = TRUE);
  if(!is.finite(out)) 0 else out}))

   id hosp attack1 attack2 attack3 out afterh
1 100    3       1       4       5   7      4
2 105    5       6       7      10  12      6
3 108    2       3       9      NA  11      3
4 200    6       4      10      NA  12     10
5 205    2       1      NA      NA   9      0
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • 1
    Do note that if the columns are out of order (`attack3` | `attack1` | `attack2`), we cannot rely on `first()` to find the _smallest_ value greater than `hosp`. – Greg Jun 11 '21 at 17:02
  • Thank you very much dear @Greg, you have pointed to a very subtle point. I made a slight modification. – Anoushiravan R Jun 11 '21 at 17:05
  • 1
    My pleasure! Though I do wonder if `first` ∘ `sort` would be as computationally efficient — on a lot of columns — as `min` (with something to ensure `NA` instead of `Inf` for an empty vector) – Greg Jun 11 '21 at 17:18
  • 1
    I made a slight modifications to my code. It was also a very good point, Thank you very much indeed. – Anoushiravan R Jun 11 '21 at 17:34
2
data %>% 
  # Nest attack columns
  nest(attacks = starts_with('attack')) %>% 
  # Only one row at a time
  rowwise() %>% 
  # Find first instance for each row
  mutate(afterh = first(attacks[attacks > hosp])) %>% 
  # Unnest attacks
  unnest(attacks)
Oliver
  • 8,169
  • 3
  • 15
  • 37
  • +1 Cleanest and most extensible solution so far. The nesting of all `attack*` columns was a prudent touch. **Do note** that if the columns are out of order, however, we cannot rely on `first()` to find the smallest value greater than `hosp`. – Greg Jun 11 '21 at 16:57
  • good point. Didn't consider it in my quick-and-dirty solution. – Oliver Jun 11 '21 at 17:07
  • I wouldn't consider it "dirty", per se. It keeps things pretty clean by leveraging `dplyr`, and it catches any and all `attack*` columns. – Greg Jun 11 '21 at 17:10
1

Nearly similar answer but using dplyr only

library(dplyr, warn.conflicts = F)

data %>% rowwise() %>%
  mutate(afterh = {xx <- select(cur_data(), starts_with('attack')); first(xx[xx > hosp])})

#> # A tibble: 5 x 7
#> # Rowwise: 
#>      id  hosp attack1 attack2 attack3   out afterh
#>   <dbl> <dbl>   <dbl>   <dbl>   <dbl> <dbl>  <dbl>
#> 1   100     3       1       4       5     7      4
#> 2   105     5       6       7      10    12      6
#> 3   108     2       3       9      NA    11      3
#> 4   200     6       4      10      NA    12     10
#> 5   205     2       1      NA      NA     9     NA

Created on 2021-06-12 by the reprex package (v2.0.0)

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45