0

My email regex search is extracting .comthis as well instead of .com domains, how do I make it search only the useful domains ?

regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
                "{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
                "\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
Zaibi
  • 343
  • 1
  • 12
  • Did you care to post the regex that you are using? – Kenneth K. Apr 01 '16 at 08:19
  • You need to post your regex here to get any help – bashrc Apr 01 '16 at 08:19
  • Perhaps, you should list the TLDs you are interested in and use an alternation at the end, like `(?:com?|org|net|mobi))\b`: [`r'(?i)([a-z0-9!#$%&\'*+\/=?^_\`{|}~-]+(?:\.[a-z0-9!#$%&\'*+\/=?^_\`{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|\sdot\s))+(?:com?|org|net|mobi))\b'`](https://regex101.com/r/fQ0cC9/2). – Wiktor Stribiżew Apr 01 '16 at 08:24
  • 2
    You are well aware that your regexp will never be complete? http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html – Marco Mariani Apr 01 '16 at 08:34
  • Or fetch everything the (probably broken anyway) regex returns, and check in the following code whether the domains it extracted actually exist. There are libraries which try to keep abreast of the current state of the available TLDs in the world but you can simply check if the part after `@` can be resolved with a simple DNS query (and perhaps still get some false alarms on stuff where the string after `@` just happens to be a random domain somewhere in our still-expanding universe). – tripleee Jan 18 '18 at 09:56

1 Answers1

-1

Here's something I think might help

import re
s = 'My name is Conrad, and blahblah@gmail.com is my email.'
domain = re.search("@[\w.]+", s)
print domain.group()

outputs

@gmail.com How the regex works:

@ - scan till you see this character

[\w.] a set of characters to potentially match, so \w is all alphanumeric characters, and the trailing period . adds to that set of characters.

  • one or more of the previous set.

Because this regex is matching the period character and every alphanumeric after an @, it'll match email domains even in the middle of sentences.

aadesh raj
  • 21
  • 2
  • The problem was not that the OP's regex is matching too little. – tripleee Jan 18 '18 at 09:58
  • Your question is poorly worded and hard for someone to provide a good answer as your intent is not clear. Please read the SO guidelines before posting. – sparkitny Jan 18 '18 at 10:08