1

I'm looking for a way to use a whitelist that contains digits and the Plus sign "+" to replace all other chars from a string.

string <- "opiqr8929348t89hr289r01++r42+3525"

I tried first to use:

gsub("[[:punct:][:alpha:]]", "", string)

but this excludes also the "+":

# [1] "89293488928901423525"

How can I exclude the "+" from [:alpha:] ?

So my intension is to use a whitelist instead:

whitelist <- c("0123456879+")

Is there a way to use gsub() in the other way around? Because when I use my whitelist it will identify the chars that should remain.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • What is your expected output? – s_baldur Sep 30 '20 at 14:09
  • Yes you are right I copied an example that is not representative for all cases. i have other punctuation like "-", ".","/", ... in the dataset that i want to exclude. therefor the try with [:punct:] but i don't find a way to explude the "+" – Johannes Stephan Sep 30 '20 at 14:09

1 Answers1

4

What about this:

string <- "opiqr8929348t89hr289r01++r42+3525"
gsub("[^0-9+]", "", string)
# [1] "89293488928901++42+3525"

This replaces everything that's not a 0-9 or plus with "".

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25