1

I'm trying to match an email address here is what I've come up with so far :

String text = "gandalf_storm@mymail.com"; 
String regex = "(\\w+)@{1}(\\w+){2,}\\.{1}\\w{2,4}";

This however works with following cases :

gandalf_storm@mymail.com
gandalfstorm@mymail.com
gandalf2storm@mymail.com

So it matches any alphanumeric character repeated once or more that comes before one @ followed by any alphanumeric character repeated at least two times(which is minimal characters for any domain name) followed by one .(dot) and followed by any alphanumeric character repeated at least 2 times and at most 4 times(because there are domains such as .us or .mobi).

This expression however does not work with emails such as :

gandalf.storm@mymail.com gandalf.storm@mydomain.me.uk gandalf.storm@mysubdomain.mydomain.me.uk gandalf.storm@mysubdomain.mysubdomain.mydomain.me.uk etc as many subdomains

or

gandalf.storm@mymail.com gandalf2storm@mydomain.me.uk gandalf_storm@mysubdomain.mydomain.me.uk gandalfstorm@mysubdomain.mysubdomain.mydomain.me.uk

I've just started to learn regex and I found interesting to try to solve problems such as these by using regex .. not partially but for each case, any help would be much appriciated. Thank you

Gandalf StormCrow
  • 25,788
  • 70
  • 174
  • 263

3 Answers3

2

This question has been asked many, many times before here on SO. Here's why you don't want to use regexes to parse email addresses. Note please that that monster of a regex doesn't even handle comments.

Frank Shearar
  • 17,012
  • 8
  • 67
  • 94
  • I've just found suitable example to learn regex of course I'll not parse my emails with it, I'm just trying to learn regex by trying some examples – Gandalf StormCrow Apr 23 '10 at 08:52
  • 1
    if you are trying to learn regex, that start with a subject which regex can be used on. email addresses is not one of those subjects. – Sam Holder Apr 23 '10 at 08:54
  • @Sam Holder what are the subjects on which regex can be used on, from your expirience? – Gandalf StormCrow Apr 23 '10 at 08:57
  • (...In general you don't need to handle comments as that's part of the RFC822 header format rather than part of the e-mail address as such. However, yes, this kind of regex ‘validation’ is a terrible idea.) – bobince Apr 23 '10 at 09:01
  • @bobince do you have some other examples where to use regex?learning is the point here not using, of course I'm not going to validate it with regex, checking if it contains @ is enough for me. – Gandalf StormCrow Apr 23 '10 at 09:03
  • Necessity is the mother of invention, so I find learning easiest when I have a specific need. I learnt regexes by having to batch convert a load of java files to c#. probably not a good example, but it worked for me. What is it that you want to learn regex for? perhaps focus on that? I found tools like The Regulator and nregex.com very useful when I was learning, and still do. – Sam Holder Apr 23 '10 at 09:07
  • I've got a regex buddy which is very good, I have a spare time and I had the necessity the other day to match the value inside quotes preceded by some text and equal sign and I had to ask on SO for solution, I didn't know where to start. Even now I can't read the expression which was the solution, I'm still reading until I reach the part where these new characters are used – Gandalf StormCrow Apr 23 '10 at 09:14
  • That is part of the problem with Regex is the maintainability. Often I can get what I want by slowly building the Regex up matching steadily more, but when I come back to it it is hard to decipher what it is doing especially if you need to change it. I know you are using Java, but in .net there is a Fluent regex library (http://flimflan.com/blog/ReadableRegularExpressions.aspx) which divides people, but aims to make regex more readable/maintainable. I also find http://www.regular-expressions.info/ to be a useful reference for things I find hard to remember the syntax for – Sam Holder Apr 23 '10 at 09:20
0

The regex you use is very restrictive :

  • Using the \w character class before the @ does not allow the . character, which explains why gandalf.storm does not match
  • In the domain part of the regex, you only allow two "words" separated with a . character, which excludes "mysubdomain.mydomain.net"

You should try to fix these to match your more complicated examples.

As a side note, when you want to match a single character, the {1} part is not mandatory.

Thibault Falise
  • 5,795
  • 2
  • 29
  • 32
0

To answer your question, as you are learning.

The problem with your regex not matching with the first lot is partly because the part before the @ does not allow the '.' character. changing to this:

 String regex = "([\\w.]+)@(\\w+){2,}\\.\\w{2,4}";

should allow gandalf.storm@mymail.com, because the [\\w.]+ says any character in the group '\w' (any character) or '.' (does not need to be escaped when part of a group, actually means a dot) 1 or more times

This might give you enough of a help to be able to figure the rest out on your own. after all that is the point of learning :)

I tested this at http://www.regexplanet.com/simple/index.html which uses the java library for the engine.

Sam Holder
  • 32,535
  • 13
  • 101
  • 181