I have a vector of strings that looks something like this: c("abc@40gmail.com", "xyz@50gmail.com")
. For some reason, there are random/different digits after the @
and I'm trying to remove them. Using regular expression, how can I tell R to remove or replace the digits that come after "@"
, so I end up with c("abc@gmail.com", "xyz@gmail.com")
. I don't know much about Regex, so I'd really really appreciate if someone can provide not just the code, but also a brief explanation of the code. Thanks!
Asked
Active
Viewed 393 times
-2

hsl
- 670
- 2
- 10
- 22
-
1@Thomas has is that a dupe? From now on every text replacement question is a dupe of `gsub("e", "", x)`? The regex in the "dupe" is of exact match type, while in this question, it is a bit more complicated – David Arenburg May 17 '15 at 17:23
2 Answers
3
One option is
x <- c("abc@40gmail.com", "xyz@50gmail.com")
sub("@\\d+", "@", x)
## [1] "abc@gmail.com" "xyz@gmail.com"

David Arenburg
- 91,361
- 17
- 137
- 196
1
You could use Positive lookbehind or \K
sub("(?<=@)\\d+", "", x, perl=T)
\\d+
matches one or more digits characters. So (?<=@)
forces the regex engine to look immediate after to the @
symbol and then make it to match the following one or more digit characters. Since lookarounds belong to the PCRE family, you need to enable perl=TRUE
parameter.
OR
sub("@\\K\\d+", "", x, perl=T)

Avinash Raj
- 172,303
- 28
- 230
- 274
-
Thanks a lot! Is there any reason why you wouldn't just use the simpler `sub("@\\d+", "@", x)`? – hsl May 17 '15 at 15:55
-
@hsl because it's already mentioned. We could write atleast two answers for a single regex based question. :-) That's the beauty of regex. – Avinash Raj May 17 '15 at 15:57