0

I am trying to write a function which subsets a dataset containing a certain string.

Mock data:

library(stringr)

set.seed(1)
codedata <- data.frame(
  Key = sample(1:10),
  ReadCodePreferredTerm = sample(c("yes", "prefer", "Had refer"), 20, replace=TRUE)
)

User defined function:

findterms <- function(inputdata, variable, searchterm) {
   outputdata <- inputdata[str_which(inputdata$variable, regex(searchterm, ignore_case=TRUE)), ] 
   return(outputdata)
}

I am expecting at least a couple of rows returned, but I get 0 when I run the following code:

findterms(codedata, ReadCodePreferredTerm, " refer") #the space in front of this word is deliberate

I realise I am trying to do something quite simple... but can't find out why it isn't working.

Note, the code works fine when not defined as a function:

referterms <- codedata[str_which(codedata$ReadCodePreferredTerm, regex(" refer", ignore_case=TRUE)), ]
Dani
  • 161
  • 9
  • Probably `inputdata[str_which(inputdata[[variable]], regex(searchterm, ignore_case=TRUE)), ]` – Roland Oct 25 '18 at 10:59
  • This does not work unfortunately. I wonder why the notation '$' is wrong? – Dani Oct 25 '18 at 11:03
  • There are numerous duplicates of *that* question here. You can find the answer in `help("$")`. If you require more help, you'll need to provide a [minimal reproducible example](https://stackoverflow.com/a/5963610/1412059). – Roland Oct 25 '18 at 11:05
  • My apologies, please find my post updated accordingly. – Dani Oct 25 '18 at 11:30
  • My code above works fine if you call the function as `findterms(codedata, "ReadCodePreferredTerm", " refer") ` (as you should do, don't attempt writing a function with non-standard evaluation until you are a bit more advanced). – Roland Oct 25 '18 at 11:46
  • I see, thank you very much for your help and patience ! – Dani Oct 25 '18 at 12:04

1 Answers1

0

You can use dplyr and stringr to do this simply

library(magrittr) # For the pipe (%>%)
library(dplyr)
library(stringr)
codedata %>%
  dplyr::filter(str_detect(ReadCodePreferredTerm, '\\brefer\\b'))

You can also write your own function if you like, you will need rlang as well if you don't want to pass in a string for the variable name. something like this works

library(rlang) 
findterms <- function(df, variable, searchterm) {
  variable <- enquo(variable)
  return(
    df %>%
      dplyr::filter(str_detect(!!variable, str_interp('\\b${ searchterm }\\b')))
  )
}
findterms(codedata, ReadCodePreferredTerm, 'refer')
realrbird
  • 171
  • 2