-1

I am studying use R and stringr. Let's say I have a string:

a <- 'fda afe faref a about fae faef across afef absolute fgprg'

I have a data frame which includes some words that I want to remove from a:

b <- tibble(words=c('a','about','across'))

In my data, there are probably a lot of words in b. Here is just an example.

I want to remove all the words in b from a using R, stringr or other functions if there is better. I hope the result would be:

'fda afe faref fae faef afef absolute fgprg'
halfer
  • 19,824
  • 17
  • 99
  • 186
Feng Chen
  • 2,139
  • 4
  • 33
  • 62

3 Answers3

3

You can do:

gsub(paste0("\\b", paste0(b$words, collapse = "\\b( )?|\\b"), "\\b( )?"), "", a)
# [1] "fda afe faref fae faef afef absolute fgprg"

\\b indicates the word boundary and with | we match several possible words. ( )? checks whether there is a space afterwards and removes that as well.

So we are matching the following expression in gsub:

paste0("\\b", paste0(b$words, collapse = "\\b( )?|\\b"), "\\b( )?")
# [1] "\\ba\\b( )?|\\babout\\b( )?|\\bacross\\b( )?"

Or with stringr:

library(stringr)
str_replace_all(a, str_c("\\b", str_c(b$words, collapse = "\\b( )?|"), "\\b( )?"), "")
kath
  • 7,624
  • 17
  • 32
1

You could use gsub, for the a we need to specify "word boundaries" by regex with \b.

gsub("\\ba \\b|about |across ", "", a)
# [1] "fda afe faref fae faef afef absolute fgprg"
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

You can use gsub where \\b marks a word begin/end and * removes the white space if there is one.

gsub(paste0(" *\\b",b,"\\b *", collapse = "|"), "", a)
#[1] "fda afe faref fae faef afef absolute fgprg"

Data:

a <- 'fda afe faref a about fae faef across afef absolute fgprg'
b <- c('a','about','across')
GKi
  • 37,245
  • 2
  • 26
  • 48