0

I'm learning regular expressions in javascript and I think there is something I'm missing.

I'm using an example where I'm trying to extract valid email addresses from a string. I'm getting the valid emails but I'm also getting invalid ones. Here's the code:

var teststring = "This is my test string with a valid email: this@that.com,
             and an invalid email: this@broken.1. Pull only the valid email.";

teststring.match(/[A-Za-z0-9_+.-]+@[A-Za-z0-9]+.[A-Za-z]{2,3}/g)

When I run the match method, I get both the valid email "this@that.com" and the invalid email "this@broken.1" returned.

I thought the {2,3} at the end of the last square brackets was supposed to specify that the particular character search within the brackets should only be valid if they contain 2 to 3 instances of the criteria, so why does the broken email with just the "1" after the last dot get returned?

I should also add that I totally understand that this is not a be all end all email validation expression. This is purely a trying-to-understand-regular-expressions question for me. I was searching around for a clear answer but couldn't find exactly what I was looking for.

Thanks

Chris Schmitz
  • 20,160
  • 30
  • 81
  • 137
  • I get you're just doing this to learn, but it shouldn't go unnoticed that [validating email adresses with regular expressions is a bad idea](http://stackoverflow.com/a/201378/1675492) – Ingo Bürk Nov 16 '13 at 19:37

3 Answers3

1

. will match any character. To match an actual . you need \.

teststring.match(/[A-Za-z0-9_+.-]+@[A-Za-z0-9]+\.[A-Za-z]{2,3}/g)
OGHaza
  • 4,795
  • 7
  • 23
  • 29
1

You need to escape that last .. Otherwise, it means "match any character", so the expression is catching these chunks:

this
@
brok
en

try:

teststring.match(/[A-Za-z0-9_+.-]+@[A-Za-z0-9]+\.[A-Za-z]{2,3}/g)
Paul Roub
  • 36,322
  • 27
  • 84
  • 93
  • Awesome, Thanks! Also thanks for adding the chunks in your reply. That definitely helped me visualize what was happening. – Chris Schmitz Nov 16 '13 at 19:54
0

Since you're looking for capital or lowercase letters, you can simplify your search by making it case insensitive. the /g at the end of the regular expression makes the match global (i.e. return all such matches); you could instead use /i if you wanted only one match but you didn't care about case. For example,

"CaSe InSeNsItIvE iS cOoL; I lOvE cAsEs".match(/case/i)

returns the one-element array ["CaSe"]. To get all case-insensitive matches, just use /gi:

"CaSe InSeNsItIvE iS cOoL; I lOvE cAsEs".match(/case/gi)

returns ["CaSe", "cAsE"]

Your query can be shortened to

teststring.match(/[A-Z0-9_+.-]+@[A-Z0-9]+\.[A-Z]{2,3}/gi)
elreimundo
  • 6,186
  • 1
  • 13
  • 6