4

I want to replace all words containing the symbol @ with a specific word. I am used gsub and therefore am applying it to a character vector. The issue that keeps occuring is that when I use:

gsub(".*@.*", "email", data) 

all of the text in that portion of the character vector gets deleted.

There are multiple different emails all with different lengths so I can't set the characters prior and characters after to a specific number.

Any suggestions?

I've done my fair share of reading about regex but everything I tried failed.

Here's an example:

data <- c("This is an example. Here is my email: emailaddress@help.com. Thank you")

data <- gsub(".*@.*", "email", data)

it returns [1] "email"

when I want [1] "This is an example. Here is my email: email. Thank you"

user3772674
  • 43
  • 1
  • 4
  • Welcome to Stack Overflow! Please consider including a *small* [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can better understand and more easily answer your question. – Ben Bolker Jun 24 '14 at 20:12
  • 1
    `.*` matches all characters, perhaps you want `[^\s]*@[^\s]*` – Fabricator Jun 24 '14 at 20:14
  • If you just want to replace the "@" sign, just use `gsub("@", "email", data) ` otherwise anything else you match with the pattern will be replaced as well. – MrFlick Jun 24 '14 at 20:15
  • @user3772674 When adding new information, it's better to edit your original question than to add additional information in the comments. – MrFlick Jun 24 '14 at 20:16

2 Answers2

6

You can use the following..

gsub('\\S+@\\S+', 'email', data)

Explanation:

\S matches any non-whitespace character. So here we match for any non-whitespace character (1 or more times) preceded by @ followed by any non-whitespace character (1 or more times)

hwnd
  • 69,796
  • 4
  • 95
  • 132
2

To replace strings with an embedded "@" in R, you can use (translaiting @Fabricator's pattern to R)

data <- c("This is an example. Here is my email: emailaddress@help.com")
gsub("[^\\s]*@[^\\s]*", "email", data, perl=T) 
data
# [1] "This is an example. Here is my email: email"
MrFlick
  • 195,160
  • 17
  • 277
  • 295