0

I have a script that successfully creates a new column in the existing data frame, using mutate combined with str_detect to indicate whether a drug component is present in the old variable. I would like to turn this script into a function to make it easier to use repetitively. My attempts to create the function have failed.

Here is the script:

Drug_Table_Names <- data.frame(mutate(
    Drug_Table_Names, 
    DRUG_GENERIC_NAME, 
    Flurbiprofen = str_detect(Drug_Table_Names$DRUG_GENERIC_NAME,"FLURBIPROFEN", negate = FALSE)
))

The script finds Flurbiprofen in the column DRUG_GENERIC_NAME and creates a new column named FLUBIPROFEN which TRUE if Flurbiprofen is present.

My attempt to create a function FlagDrugNames was written this way:

function(drug_flag, gen_name){
    Drug_Table_Names <- data.frame(mutate(
        Drug_Table_Names, 
        DRUG_GENERIC_NAME, 
        drug_flag = str_detect(Drug_Table_Names$DRUG_GENERIC_NAME,
                               "gen_name", negate = FALSE)
    ))
}

Where drug_flag is the string the function looks for, in the example above is is Flurbiprofen, and gen_name is the name of the column it creates. This function does not work.

I would appreciate any help with the function.

alistaire
  • 42,459
  • 4
  • 77
  • 117
  • 2
    In the function `str_detect` is looking for the *literal* `"gen_name"`, not the value of a *function argument* `gen_name`. Also, 1) in `str_detect` argument `negate = FALSE` is already the default; 2) the last function code line should be return value, `Drug_Table_Names`. – Rui Barradas Jun 30 '19 at 06:55
  • 1
    More things: 1. `mutate` already returns a data frame. If you want to untibble, use `as.data.frame`. 2. Putting `DRUG_GENERIC_NAME` in `mutate` doesn't do anything. 3. Generally don't use `$` inside of dplyr functions; use the bare column name. 4. If you need to set a column name in a function in `mutate`, [you'll have to use `:=`](https://dplyr.tidyverse.org/articles/programming.html). 5. You should assign the function to something and supply some sample data and a call [to make your example reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – alistaire Jun 30 '19 at 07:08
  • Rui Barradas is correct. I would add that your data argument isn't quite right. Your function takes in `drug_flag` and `gen_name`. I presume `drug_flag` is the data, but within the function you use `Drug_Table_Names` as data. The only thing you do with `drug_flag` is assign the output of `str_detect` to it. –  Jun 30 '19 at 07:11
  • Thank you: Follow up questions/comments: To Rui Barradas, how do I write the function so that I can enter the string I'm looking for as an argument, which was the intent of of gen_name? I added the line – Steven Hahn Jun 30 '19 at 17:56
  • continuation of previous comment, ... return, Drug_Table_Names and got a tibble rather than a modification of the original data frame. How do I change the output so that it modifies the data fram? – Steven Hahn Jun 30 '19 at 18:24

1 Answers1

1

Like it is said in the comments, the code in the question is complicating what can be made much simpler.

library(tidyverse)

flagCol <- function(DF, gen_name){
  DF %>%
    mutate(drug_flag = str_detect(DRUG_GENERIC_NAME, gen_name))
}

Test the function with an example data set.

Drug_Table_Names <- letters[1:10]
DRUG_GENERIC_NAME <- LETTERS[1:10] 

df1 <- data.frame(Drug_Table_Names, DRUG_GENERIC_NAME)

flagCol(df1, "G")
#   Drug_Table_Names DRUG_GENERIC_NAME drug_flag
#1                 a                 A     FALSE
#2                 b                 B     FALSE
#3                 c                 C     FALSE
#4                 d                 D     FALSE
#5                 e                 E     FALSE
#6                 f                 F     FALSE
#7                 g                 G      TRUE
#8                 h                 H     FALSE
#9                 i                 I     FALSE
#10                j                 J     FALSE

Edit

The OP asks two questions in a comment.

Question 2:

I want the name of the new variable, which is drug_flag in your example above, to be the same as the string I look for with str_detect. It there a way to add an argument to the function so that it will accept the same character string as the value of gen_name?

Yes, there is. The function below does that, and doesn't need package dplyr loaded, only package stringr. It accepts an extra argument, DRUG_COL with default set to "DRUG_GENERIC_NAME", the name of the column where to look for gen_name.

flagCol2 <- function(DF, gen_name, DRUG_COL = "DRUG_GENERIC_NAME"){
  DF[[gen_name]] <- str_detect(DF[[DRUG_COL]], gen_name)
  DF
}

flagCol2(df1, "G")
#   Drug_Table_Names DRUG_GENERIC_NAME     G
#1                 a                 A FALSE
#2                 b                 B FALSE
#3                 c                 C FALSE
#4                 d                 D FALSE
#5                 e                 E FALSE
#6                 f                 F FALSE
#7                 g                 G  TRUE
#8                 h                 H FALSE
#9                 i                 I FALSE
#10                j                 J FALSE

Question 1 (my emphasis):

It currently returns a tibble and I want it to add the new variable to the original data frame which is Drug_Table_Names. How do I modify the function to do that.

There is no need to modify the functions, both flagCol and flagCol2 do that. All it needs to be done is to assign the result back to the original dataframe.

df1 <- flagCol(df1, "G")

Or the second version.

df1 <- flagCol2(df1, "G")

Edit 2.

Multiple genes passed to the function.

flagCol3 <- function(DF, gen_name, DRUG_COL = "DRUG_GENERIC_NAME"){
  for(gn in gen_name){
    DF[[gn]] <- str_detect(DF[[DRUG_COL]], gn)
  }
  DF
}

# There is no gene "X", but the column is created
df3 <- flagCol3(df1, c("B", "G", "X"))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you, This works as advertised. Questions: 1) It currently returns a tibble and I want it to add the new variable to the original data frame which is Drug_Table_Names. How do I modify the function to do that. 2) I want the name of the new variable, which is drug_flag in your example above, to be the same as the string I look for with str_detect. It there a way to add an argument to the function so that it will accept the same character string as the value of gen_name? – Steven Hahn Jun 30 '19 at 23:29
  • This works perfectly, thank you very much. There is one more thing I would like to be able to do with this function: Is there a way to send a list of names for the argument gen_name so that the function can run a series of names? – Steven Hahn Jul 01 '19 at 19:04
  • @StevenHahn Edit 2 implements that, function `flagCol3`. – Rui Barradas Jul 01 '19 at 19:26