0

I need to create a new yes/no (or 1/0) variable based on a group of existing ICD columns, whichever have specific values that meet the requirement. My current code is: inclusion %>% filter_at(vars("col1", "col2", "col3"), any_vars(. %in% c(49100, 49122, 48911, 404))). However, this will not help me generate the final yes/no variable. Any suggestions?

Lisa
  • 21
  • 3
  • You should use the `dplyr` function `dplyr::mutate()`. With this function you can add/modify columns in your dataset. You cannot do this with `dplyr::filter_at()`. – van Nijnatten Mar 25 '21 at 16:33
  • hi jing. Can you add a reprex? this will increase the chances of getting a concrete answer? (see: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Marcelo Avila Mar 25 '21 at 19:40

1 Answers1

2

Instead of using filter_at you could consider using mutate(across(...)) along with an ifelse:

inclusion %>%
   mutate(across(c(col1, col2, col3),
   ~ifelse(.x %in% c(49100, 49122, 48911, 404), TRUE, FALSE)))

This would override those columns. If you want new columns add a .names argument as follows:

inclusion %>%
   mutate(across(c(col1, col2, col3),
   ~ifelse(.x %in% c(49100, 49122, 48911, 404), TRUE, FALSE),
   .names = "{col}_in_vec"))

If you want to have a single output for whether any of the values are included in any of the three columns, use c_across:

inclusion %>%
   rowwise() %>%
   mutate(in_vec = any(c_across(c(col1, col2, col3)) %in% c(49100, 49122, 48911, 404)))
Will Hipson
  • 366
  • 2
  • 9
  • It should retain all observations, you can swap the `TRUE` `FALSE` for `1` and `0` if you prefer, but `mutate` won't drop observations. – Will Hipson Mar 25 '21 at 16:50
  • All obs retained; but how can I find the new variable created? For example, if any of the three columns has a value of 49100, a new variable with value =1 is needed. – Lisa Mar 25 '21 at 17:39
  • You need to ensure that you are (1) assigning the output to an object using `<-` and (2) in `across` use the `.names` argument (as shown in the 2nd example) to create new columns with the desired output. – Will Hipson Mar 25 '21 at 17:44
  • I think I understand your problem now. Check the third solution and let me know if that works. – Will Hipson Mar 25 '21 at 17:48
  • I tried exact code"inclusion %>% rowwise() %>% mutate(in_vec = any(c_across(c(col1, col2, col3)) %in% c(49100, 49122, 48911, 404)))" and this is the error message: Error: `c_across()` must only be used inside dplyr verbs. – Lisa Mar 25 '21 at 17:58
  • If I run it on `mtcars` it seems to work: `mtcars %>% rowwise() %>% mutate(in_vec = any(c_across(c(mpg, cyl, disp)) %in% c(21, 6, 160)))` You could instead pipe the result from the 2nd solution into another `mutate` which looks like `mutate(in_vec = ifelse(any(col1, col2, col3), 1, 0))` – Will Hipson Mar 25 '21 at 18:01
  • Thank you very much - I will go back to the second solution – Lisa Mar 25 '21 at 18:07
  • I just noticed if you're using the second solution make sure you use the new variable names, so it would be `mutate(in_vec = ifelse(any(col1_in_vec, col2_in_vec, col3_in_vec), 1, 0))` – Will Hipson Mar 25 '21 at 18:10