0

This my expression

(\w+\.)*\w+\s*(@|\({1}\s*at\s*\){1}|\s+at\s+){1}\s*(\S{2,3}\.)?(\w+)(\s*dot\s*|\s*\.*\s*)(com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum|ru)\b

gives ombe@cs.name.edu from l@ombe@cs.name.edu which is wrong

I need to exclude l@ombe@cs.name.edu from possible variants

I have developed this regex to extract emails out of a variety of obfuscated strings rather than validate, as a part of my home assignment. Stuck on l@ombe@cs.name.edu test.

Would you please help me?

Redefenition:

Expression:

(\w+)@(\w+)\.name\.edu

2 strings:

  • name@name1@cs.name.edu
  • name2@cs.name.edu

As a result a get:

name1@cs.name.edu and name2@cs.name.edu

1 part mustn't be included at all.

Dmitry Dyachkov
  • 1,715
  • 2
  • 19
  • 46
  • What if somebody has a '.im' address, or one belonging to any other TLD which you have not listed? – ArjunShankar Apr 17 '12 at 11:33
  • It's not a problem, this is just simplistic approach, which I will replace afterwards – Dmitry Dyachkov Apr 17 '12 at 11:35
  • 1
    The regular expression for email is *very* complicated. Look at [this question, and its answers](http://stackoverflow.com/questions/201323/how-to-use-a-regular-expression-to-validate-an-email-addresses) – ArjunShankar Apr 17 '12 at 11:36
  • That is a very good example of validation. But for my learning home assignment I have developed a regex to **extract** emails out of a variety of obfuscated strings – Dmitry Dyachkov Apr 17 '12 at 11:51
  • I see that now. You also want to pick out words like 'at', 'dot' and understand what they mean. – ArjunShankar Apr 17 '12 at 11:58

2 Answers2

0

You can anchor a regular expression to the start and end of the string with ^ and $. This forces the entire string to be matched, instead of only a part of it.

In your case:

^(\w+\.)*\w+\s*(@|\({1}\s*at\s*\){1}|\s+at\s+){1}\s*(\S{2,3}\.)?(\w+)(\s*dot\s*|\s*\.*\s*)(com|org|net|edu|gov|mil|biz|info|mobi|name|aero|asia|jobs|museum|ru)\b$
Joni
  • 108,737
  • 14
  • 143
  • 193
  • This doesn't work. For example, \b in the end excludes "Talk at Supercomputing" from variants. but somehow I fail to figure out an expression for the start of the string – Dmitry Dyachkov Apr 17 '12 at 11:39
0

RFC 2822

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Cylian
  • 10,970
  • 4
  • 42
  • 55
  • Thank you, good example. I have developed this regex to **extract** emails out of a variety of obfuscated strings rather than validate, as a part of my home assignment. Stuck on l@ombe@cs.name.edu test. – Dmitry Dyachkov Apr 17 '12 at 12:18