0

I tried to use a 'function' in r for efficiency, but it seems that I get different results or no result.

When run directly, the result is,

> data1$CI_allergy <- str_extract(data1$CUR_ILL, "allergy") 
> data1$CI_allergy <- ifelse(data1$CI_allergy == "allergy", 1, 0) 
> data1$CI_allergy[is.na(data1$CI_allergy)] <-0 data1$CI_allergy <-
> ifelse(data1$CI_allergy == 0, "N", "Y") 
> 
> table(data1$CI_allergy)

      N       Y 
2714383   21642 

However, when the function is used:

CI_variable <- function(arg1, arg2) {
  data1$arg1 <- str_extract(data1$CUR_ILL, 'arg2') 
  data1$arg1 <- ifelse(data1$arg1 == 'arg2', 1, 0) 
  data1$arg1[is.na(data1$arg1)] <-0
  data1$arg1 <- ifelse(data1$arg1 == 0, "N", "Y") 
  return(table(data1$arg1))
}

CI_variable(CI_allergy, allergy)

    N 
2736025 

I am guessing the error occurred in str_extract function in CI_variable, but not sure. Has anyone had a similar problem and solved it?

AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • Welcome to SO! Please see [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) and [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It's important to provide some data, so that we can help determine where the error might occur. Provide some of your data via the output of `dput(head(data1))`. – AndrewGB Jan 09 '22 at 07:38
  • 2
    Expressions like `data1$arg1` assume that there is a column with the name "arg1" in `data1`. It will not replace `arg1` with `CI_allergy` as you want. To write your function properly, use, for example, `data1[[arg1]]` instead and call that function like this `CI_variable("CI_allergy", "allergy")`. – ekoam Jan 09 '22 at 07:39

1 Answers1

0

Since the original code includes str_extract, which is part of the tidyverse, here is an alternative approach.

First, some toy data (see how to make a reproducible example).

library(tidyverse)
df <- tribble(
   ~Cur_ILL,
  "something bad",
  "something allergy",
  "darkside",
  NA_character_
)

Then we can use several features of the tidyverse to get (dynamic) summary statistics like so

get_CI <- function(data, col, type){
  data %>%
    count("has_{type}" := ifelse(str_detect({{ col }}, type) %in% T, "Y", "N"))
}
get_CI(df, Cur_ILL, "allergy")

  has_allergy     n
  <chr>       <int>
1 N               3
2 Y               1

Explanation:

  1. count is shortcut for computing number of occurrences for a group (here "Y" and "N"). Its output is a data.frame, which is a bit easier to work with than a table for most use cases.
  2. walrus-operator := to work with glue package style variable names. Here that is "has_{type}", which inserts the type argument into the string. This makes it easier to distinguish between tables.
  3. embrace {{ as shortcut to indicate inserting a variable name
  4. x %in% T to convert NA to FALSE

Finally, an explicit return statement is not required.

Donald Seinen
  • 4,179
  • 5
  • 15
  • 40