How to use R, stringr or other package to replace a group of words from a long string?

Question

I am studying use R and stringr. Let's say I have a string:

a <- 'fda afe faref a about fae faef across afef absolute fgprg'

I have a data frame which includes some words that I want to remove from a:

b <- tibble(words=c('a','about','across'))

In my data, there are probably a lot of words in b. Here is just an example.

I want to remove all the words in b from a using R, stringr or other functions if there is better. I hope the result would be:

'fda afe faref fae faef afef absolute fgprg'

kath · Accepted Answer · 2019-10-24T06:40:40.533

You can do:

gsub(paste0("\\b", paste0(b$words, collapse = "\\b( )?|\\b"), "\\b( )?"), "", a)
# [1] "fda afe faref fae faef afef absolute fgprg"

\\b indicates the word boundary and with | we match several possible words. ( )? checks whether there is a space afterwards and removes that as well.

So we are matching the following expression in gsub:

paste0("\\b", paste0(b$words, collapse = "\\b( )?|\\b"), "\\b( )?")
# [1] "\\ba\\b( )?|\\babout\\b( )?|\\bacross\\b( )?"

Or with stringr:

library(stringr)
str_replace_all(a, str_c("\\b", str_c(b$words, collapse = "\\b( )?|"), "\\b( )?"), "")

jay.sf · Answer 2 · 2019-10-24T06:36:15.473

1

You could use gsub, for the a we need to specify "word boundaries" by regex with \b.

gsub("\\ba \\b|about |across ", "", a)
# [1] "fda afe faref fae faef afef absolute fgprg"

edited Oct 24 '19 at 06:36

answered Oct 24 '19 at 06:30

jay.sf

60,139
8
53
110

GKi · Answer 3 · 2019-10-24T08:35:34.853

0

You can use gsub where \\b marks a word begin/end and * removes the white space if there is one.

gsub(paste0(" *\\b",b,"\\b *", collapse = "|"), "", a)
#[1] "fda afe faref fae faef afef absolute fgprg"

Data:

a <- 'fda afe faref a about fae faef across afef absolute fgprg'
b <- c('a','about','across')

edited Oct 24 '19 at 08:35

answered Oct 24 '19 at 06:34

GKi

37,245
2
26
48

How to use R, stringr or other package to replace a group of words from a long string?

3 Answers3