0

I just put here the solution, to a problem I encountered during my development, in order to have a maximum of feedback from your experience on regex. The goal is to learn how to reduce the size of regex and improve their flexibility.

Please in educational purposes, explain your corrections.

The probleme was :

I got a string that can contains some example of value

  • test@test.com
  • test@test.com;
  • test@test.com;test@test.com
  • test@test.com;test@test.com;

And this string must be validate with a regex, in order to obtains differents adress mail separate by ";"

I arrive to this result :

^(?:((?:[a-zA-Z0-9_\-\.]+)@(?:[a-zA-Z0-9\-]+\.)(?:[a-zA-Z]{2,3}))(\;)?$|(((?:[a-zA-Z0-9_\-\.]+)@(?:[a-zA-Z0-9\-]+\.)(?:[a-zA-Z]{2,3}))(?:\;))+((?:[a-zA-Z0-9_\-\.]+)@(?:[a-zA-Z0-9\-]+\.)(?:[a-zA-Z]{2,3}))(\;)?$)

If this regex is so long, it's because this example :

  • test@test.comtest@test.com;

Pass with my smaller first regex, and he must not pass.

I know that i can reduce [a-zA-Z0-9_-.] with \w but it's not my main problem.

Is there one way to make some "groups" reusable in order to write only one time

((?:[a-zA-Z0-9_\-\.]+)@(?:[a-zA-Z0-9\-]+\.)(?:[a-zA-Z]{2,3}))

and reuse it into the same regex ?

The language is PHP but the question is broader on regular expressions in general.

Thanks a lot

Th3Mouk
  • 11
  • 5
  • What language are you working in? if you use an object-oriented language, there are better ways to verify if something is a valid email address. For example, the .NET framework has a MailAddress object that does this validation for you, without having to use a Regex. – Nzall Sep 19 '14 at 13:57
  • I'm not getting what you are trying to ask. What are your expected outputs? That'll really help. – Steven Xu Sep 19 '14 at 14:00
  • While not the exact duplicate, still worth noting: http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address?rq=1 – PM 77-1 Sep 19 '14 at 14:01
  • AFAIK you can't "reuse" regex fragments (like a macro) - you must recode the section again – Bohemian Sep 19 '14 at 14:03
  • Do you actually need to capture the email addresses? If this is the case, then you must repeat the patterns. This is because each capture group is *one* capture group, even if it *repeats* ([see example](http://regex101.com/r/nF5fO3/1)). The last instance of an email address would always overwrite previous captures. TL;DR; if you need to capture emails in group 1 and optionally group 2, you need that long regex. – Sam Sep 19 '14 at 14:12
  • However, `[a-zA-Z0-9_\-\.]` can be reduced to `[\w.-]` because `\w === [a-zA-Z0-9_]`, `.` does not have special meaning in a character class, and `-` does not have a special meaning in a character class if it is at the beginning or end (and can't be a range). `[a-zA-Z]` can be replaced with `[a-z]` if you make the expression case-insensitive `(?i)`. – Sam Sep 19 '14 at 14:13
  • Finally, you can do something simple like [`^([^;]+)(?:;([^;]+));?`](http://regex101.com/r/nF5fO3/2) and then check if capture group 1 is a valid email (depending on the language, there are sometimes easy ways to do this) and same with capture group 2 if it isn't null. – Sam Sep 19 '14 at 14:15
  • Thanks all for yours asnwers. The language is PHP but the question is broader on regular expressions in general. – Th3Mouk Sep 19 '14 at 14:16

1 Answers1

1

i am using python and this is what i do to reuse regex where re.compile will compile regular expression for further use

prog = re.compile('(\w+@\w+\.\w{2,3})')
s=prog.search('content_to_be_searched1')
if s:
  print s.group()
r=prog.search('content_to_be_searched2')
if r:
  print r.group()

this is in reference with python docs and for finding mail id

(\w+@\w+\.\w{2,3})

no idea in php

Pavan Kumar T S
  • 1,539
  • 2
  • 17
  • 26