5

How do I gsub() everything except a specified character in R?

In my problem I have the following string...

"the quick brown fox jumps over a lazy dog"

I have to generate a new string with by removing all characters except 'r' and 'o' and get the following output...

"roooro"

Assuming all characters are lower case how do I go about. I tried...

gsub(!"r","",gsub(!"o","",str1))

But the '!' doesn't work.

Gaurav
  • 1,597
  • 2
  • 14
  • 31
  • 1
    See [*What regex will match every character except comma ',' or semi-colon ';'?](http://stackoverflow.com/questions/1409162/what-regex-will-match-every-character-except-comma-or-semi-colon). – Wiktor Stribiżew Aug 02 '16 at 06:22
  • Actually, I would not call it an exact dupe of the post above since the characters and actual requirements (use of a vector) make this question different (well, not sure if unique :)). – Wiktor Stribiżew Aug 02 '16 at 08:05

1 Answers1

12

We need to use ^ inside the [ to match all characters except the ro. Here, the [^ro]+ implies matching one or more characters that are not a 'r' or 'o' and replace it with blank ("").

gsub("[^ro]+", "", str1)
#[1] "roooro"

If we have a vector of values, we can create the pattern with paste

v1 <- c("r", "o")
gsub(paste0("[^", paste(v1, collapse=""), "]+"), "", str1)
#[1] "roooro"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks arkun... this works for me... however is there any way to specify 'r' and 'o' as a vector instead of directly specifying them in gsub command. – Gaurav Aug 02 '16 at 06:16
  • @Gaurav I believe you have a vector of values, in that case use `paste` (updated the post) – akrun Aug 02 '16 at 06:20
  • FYI: With that `paste` approach, one should watch out for `]`, ``\`` and `-` symbols that might mean something different or close the class prematurely. – Wiktor Stribiżew Aug 02 '16 at 06:22
  • How would this work if you would like to remove everything but "ro", so not the two characters separately? – Michael Mar 21 '18 at 11:48
  • @Michael Did you meant `gsub("(ro)(*SKIP)(*FAIL)|.", "", str1, perl = TRUE)` – akrun Mar 21 '18 at 11:55
  • @akrun Yes that works! Not sure how it works yet, but I guess that is Googleable. – Michael Mar 21 '18 at 12:23