3

I have a list of records with IDs (some of which are usernames and some of which are email addresses). I'd like to know how many are email addresses. I was thinking an easy way to do this would be count how many of the rows contain the @ symbol but I can't get a function to work to do this. Any help is appreciated!

Sample dataset:

x <- c("1234@aol.com", "johnnyApple", "tomb@gmail.com")
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
T D
  • 143
  • 3
  • 13
  • See also http://stackoverflow.com/questions/19341554/regular-expression-in-base-r-regex-to-identify-email-address – Sam Firke May 04 '15 at 14:19

3 Answers3

6

Both answers so far are entirely correct, but if you're looking for an email address, a method that's less likely to have false positives is:

x <- c("1234@aol.com", "johnnyApple", "tomb@gmail.com")  
sum(regexpr(".*@.*\\..*",x) != -1)
Eric Brooks
  • 657
  • 5
  • 13
  • You could even go further and require ".com", ".edu" etc, although then you risk false negatives. – Eric Brooks May 04 '15 at 14:06
  • 1
    Good thinking... Though more like `sum(regexpr(".*@.*\\..*",x) != -1)` probably to match OPs desired output. A similar approach could be `sum(sub(".*(@).*\\..*", "\\1", x) == "@")` – David Arenburg May 04 '15 at 14:07
2

Try:

x <- c("1234@aol.com", "johnnyApple", "tomb@gmail.com")
sum(grepl("@", x))
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
1

assuming you data is df, you can try

length(grep(pattern="@", df$V1))
[1] 2
Mamoun Benghezal
  • 5,264
  • 7
  • 28
  • 33