3

I've read basic regular expressions on different websites to study them. my problem is that I don't understand some of them. here is an example I'm looking at to validate an e-mail address from w3schools

$email = test_input($_POST["email"]);
if (!preg_match("/([\w\-]+\@[\w\-]+\.[\w\-]+)/",$email)) {
   $emailErr = "Invalid email format"; 
}

I don't understand the part [\w\-]+ from my own understanding it says "string that has at least a alphanumeric". can you give me a clear explanation of this?

Phil
  • 157,677
  • 23
  • 242
  • 245
zlloyd
  • 111
  • 4
  • 14
  • 1
    use regular-expressions.info, they're a good site for regexp tutorials. You can also enter your regexp into http://gskinner.com/RegExr/, and then hover your mouse over each piece and it will explain it in a tooltip. – Barmar Feb 21 '14 at 03:50
  • 1
    [regular-expressions.info](http://www.regular-expressions.info/tutorial.html) has an excellent (free) tutorial and is a good place to start. – ridgerunner Feb 21 '14 at 03:51
  • 1
    FYI - that pattern will invalidate valid email addresses like `me+stackoverflow@example.com` or `me@example`. It will also pass invalid addresses like `hey look@this.thing over here` – Phil Feb 21 '14 at 04:00
  • [regex101.com](http://regex101.com/) has been helpful for me to visualize what is going on. – CommandZ Feb 21 '14 at 04:09

3 Answers3

1

The character class [\w\-] (or more accurately without the unnecessary escaping, [\w-]) means

  1. \w - Word character; any letter, number or underscore character, or...
  2. - any hyphen

Using [\w-]+ means "one or more letters, numbers, underscores or hyphens".

As mentioned in the comments above, don't use W3Schools. http://www.regular-expressions.info/ is the best resource available (IMHO).

Phil
  • 157,677
  • 23
  • 242
  • 245
1

Explanation:

[\w\-]+

This means any word character (a-z and underscores) and hyphens \-, between one and unlimited times, giving back as needed (greedy) +

Some good regex resources for learning:

http://Regex101.com

http://www.regular-expressions.info

Get to it.

Community
  • 1
  • 1
Vasili Syrakis
  • 9,321
  • 1
  • 39
  • 56
1

Here's the breakdown:

  1. \w is a character class which simply means letters, numbers, plus underscore. In regex, this is short for [A-Za-z0-9_]
  2. \w\- adds the hyphen to the \w class (not sure why the hyphen is escaped)
  3. [\w\-]+ means repeat the pattern at least once. So, 9@email.com is valid, but @email.com is obviously not.

Also, depending on your use case, you may be interested in this discussion on SO about why relying on regexes to validate email addresses may be a bad idea:

Using a regular expression to validate an email address

Community
  • 1
  • 1
Josh Liptzin
  • 746
  • 5
  • 12
  • 1
    The hyphen is probably escaped because most newbies (like the authors / editors over at W3Schools) think that a hyphen in a character class **always** defines a range – Phil Feb 21 '14 at 03:57
  • Got it. To the OP, avoid w3schools whenever possible. – Josh Liptzin Feb 21 '14 at 04:02