0

How do I achieve the following with a regex:

  • Match if string doesn't start with a certain character
  • Match if there are no two ","'s or any other characters
  • Match if the string has double ", even if they are not adjacent

Using Python.

Currently I am attempting to match email addresses with these rules included. The current pattern I have is

pattern = '^([A-Z0-9._-\"]|\"[!\,;]\"){1-127}+@[^-][A-Z0-9.-]{3-256}+\.[A-Z]{2,4}[^-]$'

But I am confused with how to implement these rules.

Being more specific: I want a pattern that matches an email adress consisting of 2 parts (name, domain). The name part should be no longer then 128 characters and should go before @. It should cosist of a-z0-9 chracters and also ., _, -, ". The name can't have to adjacent dots. If the name has " then it should be paired with another ". The name can have !;, characters if they are in between paired ".

The domain name should be no longer then 256 and no shorter then 3 characters, should be separated by a dot. The domain name can't begin or end with -.

This information is given to help you understand what I want, the main question is about three rules I stated in the top. I will gladly appreciate it if you tell me how to achieve them.

Euphe
  • 3,531
  • 6
  • 39
  • 69
  • 5
    When asking regex questions, please include the following information: 1) what language / platform are you using? 2) what have you tried so far? 3) provide examples of what strings you'd like to match and what strings you don't want to match – p.s.w.g Jan 08 '14 at 14:54
  • 1
    @p.s.w.g, thank you for your comment, I modifed the question. – Euphe Jan 08 '14 at 14:56
  • 1
    I’ve tried adapting the title to describe your actual problem to be more informative but to be honest I’m not sure what you actually want to match, your description is too vague, and the regex you’ve posted doesn’t match your description *at all*. Please be more specific. – Konrad Rudolph Jan 08 '14 at 15:02
  • Please, add examples of what you should accept and what should not. – finiteautomata Jan 08 '14 at 15:04
  • I edited the question and described exactly what I am trying to do. – Euphe Jan 08 '14 at 15:07
  • See e.g. http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address?rq=1 – Fredrik Pihl Jan 08 '14 at 15:10
  • @FredrikPihl, thank you for the link, but I am more interested in understanding how I can do the specific things listed in my list of 3 statements starting with "Match if". Just so I can understand regex better and use it even when not validating email adresses. – Euphe Jan 08 '14 at 15:16
  • 2
    Don't! Write small python-functions that checks for each one of your requirements. If all returns True, you have a match. Without this approach how would you know which part of the regexp failed? The function approach will also be extremely easier to update if the requirements change compared to throwing away your regexp and starting from scratch! – Fredrik Pihl Jan 08 '14 at 15:19
  • @FredrikPihl, how do I make a function to match if there are no two adjacent dots in the part of email before @? – Euphe Jan 08 '14 at 15:27
  • 1
    `if data.find('..'): return True` – Fredrik Pihl Jan 08 '14 at 15:30

1 Answers1

0

I am confused about your question. Your title says comma separated list but then you talk about email addresses. There is an official standard regex for emails:

(?:[a-z0-9!#$%&'*+/=?^_{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9?.)+a-z0-9?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])

Joe
  • 7,922
  • 18
  • 54
  • 83