23

I realize this question probably seems painfully simple to most regular expression masters, but reviewing similar questions has not yielded a solution.

I have a vector of e-mail addresses called email and would like to extract the text after the final period in each one. For the sake of example,

email<-c("xxxxx1@xxx.com", "xxxx2@xxx.edu", "xxxxx3@xxx.co.uk")

I have tried:

grep("[\.][a-zA-Z]*?$", email, value=T)

This gets me the error message:

Error: '.' is an unrecognised escape in character string starting ""."`

Removing the escape character on the other hand

grep("[.][a-zA-Z]*?$", email, value=T)

returns the entire e-mail address as does:

grep("\\.[a-zA-Z]*$", email, perl=T, value=T)

I'd really appreciate help at this point.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
user2230555
  • 435
  • 1
  • 3
  • 9

2 Answers2

25

If you need to extract the string after the last period (.), try with sub

sub('.*\\.', '', email)
#[1] "com" "com"

data

email <- c('akrun.123@gmail.com', 'xxx$xxxx.com')
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    looking now, I prefer your answer! also going to add that file_ext uses a slightly different approach: https://github.com/wch/r-source/blob/d28b1e480fed4e00ea85f61becb73527bd6e7c7f/src/library/tools/R/utils.R#L23 – MichaelChirico Mar 13 '17 at 05:05
12

Try

email <- c("michael.chirico@some.isp.com", "xxx@xxxx.com")
sapply(strsplit(email, split= ".", fixed = TRUE), tail, 1L)

# [1] "com" "com"

Also, as pointed out by @RichardScriven, tools has a tailor-made function for what you're trying to do specifically:

library(tools)
file_ext(email)
# [1] "com" "com"
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198