4

I would like to remove multiple words from a string in R, but would like to use a character vector instead of a regexp.

For example, if I had the string

"hello how are you" 

and wanted to remove

c("hello", "how")

I would return

" are you"

I can get close with str_remove() from stringr

"hello how are you" %>% str_remove(c("hello","how"))
[1]  "how are you"   "hello  are you"

But I'd need to do something to get this down into a single string. Is there a function that does all of this on one call?

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
RayVelcoro
  • 524
  • 6
  • 21
  • Related: [Removing words featured in character vector from string](https://stackoverflow.com/questions/35790652/removing-words-featured-in-character-vector-from-string) – Henrik May 10 '19 at 19:06

2 Answers2

4

We can use | to evaluate as a regex OR

library(stringr)
library(magrittr)
pat <- str_c(words, collapse="|")
"hello how are you" %>% 
      str_remove_all(pat) %>%
      trimws
#[1] "are you"

data

words <- c("hello", "how")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    Improvement suggestion: paste+collapse = "|" on a vector of words.. so you don't have to retype everyting? – Wimpel May 10 '19 at 18:49
  • Great idea -- very clever! I am surprised that something like this isn't implemented in one of the strings packages, but this is a simple workaround. – RayVelcoro May 10 '19 at 18:55
  • 1
    For `str_remove` the default interpretation is `regex`. You can wrap with `fixed` though. But, the issue is that it expects the pattern and string to be of same length. `"hello how are you" %>% str_remove_all(fixed(words)) [1] " how are you" "hello are you"` – akrun May 10 '19 at 18:56
2

A base R possibility could be:

x <- "hello how are you"   
trimws(gsub("hello|how", "\\1", x))

[1] "are you"

Or if you have more words, a clever idea proposed by @Wimpel:

words <- paste(c("hello", "how"), collapse = "|")
trimws(gsub(words, "\\1", x))
tmfmnk
  • 38,881
  • 4
  • 47
  • 67