0

Say for example, I have a character vector

a [1] "hi come asap, the show is awsome" "I am suffering from cold"
d [1] "asap" "awsome" "cold" "lol" "rofl"

I should replace any word(from "d") if found in "a" with empty space. How do I implement in R?

user1946217
  • 1,733
  • 6
  • 31
  • 40

2 Answers2

4

Perhaps something like the following would work for you:

a  <- c("hi come asap, the show is awsome", "I am suffering from cold")
d <- c("asap", "awsome", "cold", "lol", "rofl")
d[d %in% gsub("[[:punct:]]", "", unlist(strsplit(a, " ")))] <- " "
d
# [1] " "    " "    " "    "lol"  "rofl"

Or, the opposite way:

a  <- c("hi come asap, the show is awsome", "I am suffering from cold")
d <- c("asap", "awsome", "cold", "lol", "rofl")
gsub(paste(d, collapse = "|"), " ", a)
# [1] "hi come  , the show is  " "I am suffering from  "  
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Hi, Thanks for the quick reply. But I get an error at my end Error in gsub(paste(d, collapse = "|"), " ", tweets2) : assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 627 I know what gsub does, can u please explain the significance of "paste(d, collapse = "|")" – user1946217 Feb 04 '13 at 07:51
  • @user1946217, I believe it's called "alternation" (see [here](http://www.regular-expressions.info/alternation.html)), so `paste(d, collapse = "|")` creates an alternation pattern that looks like `"asap|awesome|cold|lol|rofl"` meaning look for "asap" or "awesome" or.... Do you get the error on this small example too, or just on your actual data? Do you have any packages loaded that might interfere with base R's `gsub`? – A5C1D2H2I1M1N2O1R2T1 Feb 04 '13 at 08:14
  • Thanks, that explains... It's working on the small example given here. The error occurs on my actual data. I did not load any packages in R. The vector "d" has some words with special characters also in my actual data, like "$#!+" "$$". Is this the prob? – user1946217 Feb 04 '13 at 08:22
  • @user1946217, perhaps, because some of those are special characters and would probably need to be escaped. Special characters usually include ` \ | ( ) [ { ^ $ * + ? ` (see `?regex` for more info). – A5C1D2H2I1M1N2O1R2T1 Feb 04 '13 at 08:37
2

I think I understand but could be wrong. You could try:

a  <- c("hi come asap, the $#!+ show is awsome", "I am suffering from cold")
d <- c("asap", "awsome", "cold", "lol", "rofl")

library(qdap)
mgsub(d, "", a)

Yields:

> mgsub(d, "", a)
[1] "hi come , the $#!+ show is" "I am suffering from" 
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519