Regex get domain name from email

Question

I am learning regex and am having trouble getting google from email address

String

first.name@google.com

I just want to get google, not google.com

Regex:

[^@].+(?=\.)

Result: https://regex101.com/r/wA5eX5/1

From my understanding. It ignore @ find a string after that until . (dot) using (?=\.)

What did I do wrong?

The `[^@]` is one non `@`. The `.+` one or more of any character (excluding new line). Try https://regex101.com/r/wA5eX5/2 — chris85, Aug 18 '16 at 20:50

score 31 · Accepted Answer · answered Aug 18 '16 at 20:50

31

[^@] means "match one symbol that is not an @ sign. That is not what you are looking for - use lookbehind (?<=@) for @ and your (?=\.) lookahead for \. to extract server name in the middle:

(?<=@)[^.]+(?=\.)

The middle portion [^.]+ means "one or more non-dot characters".

Demo.

answered Aug 18 '16 at 20:50

Sergey Kalinichenko

714,442
84
1,110
1,523

Thank you. It appear work fine if you take out `(?=\.)` – I'll-Be-Back Aug 18 '16 at 21:11
@I'll-Be-Back That is true - `[^.]` takes care of that. However, if an address looks like this `a@google`, expression without `(?=\.)` look-ahead would still match `google`, while expression with the look-ahead would reject such string as invalid. – Sergey Kalinichenko Aug 18 '16 at 21:33
2

`(?<=@)[^.]+(?=\.).*` for everything after `@`, including subdomain – TryTryAgain Oct 30 '18 at 02:27
3

Lookbehinds are only supported by Chrome – yairniz Jul 10 '19 at 06:48
@yairniz That's not true, lookbehinds are supported by many regex engines, such as ones supplied with .NET and Java. – Sergey Kalinichenko Jul 10 '19 at 10:26
@dasblinkenlight Please add a warning not to use on web it breaks on Firefox right now – yairniz Jul 10 '19 at 10:51
@yairniz that would be unwarranted because the question is not tagged for the web; it’s a “generic” regex question. – Sergey Kalinichenko Jul 10 '19 at 11:51
@dasblinkenlight You are correct but better safe than sorry – yairniz Jul 11 '19 at 12:13

score 19 · Answer 2 · edited Jan 06 '17 at 01:16

19

Updated answer:
Use a capturing group and keep it simple :)

@(\w+)

Explanation by splitting it up
( capturing group for extraction )
\w stands for word character [A-Za-z0-9_]
+ is a quantifier for one or more occurances of \w

Regex explanation and demo on Regex101

edited Jan 06 '17 at 01:16

bobble bubble

16,888
3
27
46

answered Aug 18 '16 at 20:50

Rahul Desai

15,242
19
83
138

Stephen · Answer 3 · 2018-02-07T21:59:35.013

7

I used the solution's regex for my task, but realized that some of the emails weren't that easy: foo@us.industries.com, foobar@tm.valves.net, andfoo@ge.test.com

To anyone who came here wanting the sub domain as well (or is being cut off by it), here's the regex:

(?<=@)[^.]*.[^.]*(?=\.)

edited Feb 07 '18 at 21:59

answered Feb 07 '18 at 21:53

Stephen

1,072
1
19
33

This is the correct solution to cater for domains with subdomains. – Shawn Vader Dec 04 '19 at 13:58
Thanks for sharing, but seem it's not working well such structure name.surname@subd.maind.com – Daniel Chepenko Apr 08 '21 at 12:11
Also this won't work with domains with the following structure: `domain.co.uk` – nsayer Sep 13 '21 at 16:02
2

@DanielChepenko I'm not sure what you are trying to grab, but the sub and main are being grabbed the way I've mentioned above. https://regex101.com/r/DUrvDI/1 – Stephen Sep 13 '21 at 21:01

score 4 · Answer 4 · answered Jul 08 '19 at 15:13

As I was working to get the domain name of email addresses and none corresponded to what I needed:

To not catch subdomains
To match countries top domains (like .com.ar or co.jp)

For example, in test@ext.domain.com.mx I need to match domain.com.mx

So I made this one:

[^.@]*?\.\w{2,}$|[^.@]*?\.com?\.\w{2}$

Here is a link to regex101 to illustrate the regex: https://regex101.com/r/vE8rP9/59

You can get the sumdomain name (without the top-level domain ex: .com or .com.mx) by adding lookaround operators (but it will match twice in test@test.com.mx):

[^.@]*?(?=\.\w{2,}$)|[^.@]*?(?=\.com?\.\w{2}$)

Israel Unterman · Answer 5 · 2016-08-18T21:05:18.137

3

This should be the regex:

(?<=@)[^.]+

(?<=@) - places the search right after the @ [^.]+ - take all the characters that are not dot (stops on dot)

So it extracts google from the email address.

edited Aug 18 '16 at 21:05

answered Aug 18 '16 at 20:55

Israel Unterman

13,158
4
28
35

score 2 · Answer 6 · answered Aug 19 '16 at 03:11

Maybe not strictly a "full regex answer" but more flexible ( in case the part before the @ is not "first.last") would be using cut:

cut -d @ -f 2 | cut -d . -f 1

The first cut will isolate the part after @ and the second one will get what you want. This will work also for another kinds of email patterns : xxxx@server.com / xxx.yyy.zzz@ server.com and so on...

m1m1k · Answer 7 · 2022-09-08T22:41:08.863

Thanks everyone for your great responses, I took what you had and expanded it with labelled match-groups for easy extraction of separate parts.

Caveat : Regex.Speed = Slow

Another post mentioned how SLOW and nonperformant regexes are, and that is a fair point to remember. My particular need is targeting my own background/slow/reporting processes and therefore it doesn't matter how long it takes. But it's good to remember whenever possible Regex should NOT be used in any sort of web page load or "needs-to-be-quick" kind of application. In that case you're much better off using substring to algorithmically strip down the inputs and throw away all the junk that I'm optionally matching/allowing/including here.

https://regex101.com/r/ZnU3OC/1

One Regex to rule them all...

Subdomain/Domain/TopLevelDomain/CountryCode extraction for Emails, domain lists, & URLs
Also handles ?Querystring=junk, Slashes/With/Paths, #anchors
Now with more broth, batteries not included

^(?<Email>.*@)?(?<Protocol>\w+:\/\/)?(?<SubDomain>(?:[\w-]{2,63}\.){0,127}?)?(?<DomainWithTLD>(?<Domain>[\w-]{2,63})\.(?<TopLevelDomain>[\w-]{2,63}?)(?:\.(?<CountryCode>[a-z]{2}))?)(?:[:](?<Port>\d+))?(?<Path>(?:[\/]\w*)+)?(?<QString>(?<QSParams>(?:[?&=][\w-]*)+)?(?:[#](?<Anchor>\w*))*)?$

not overly complicated at all... why would you even say that?

Substitution / Outputs

EXAMPLE INPUT: "https://www.stackoverflow.co.uk/path/2?q=mysearch&and=more#stuff"
EXAMPLE OUTPUT:
{
  Protocol:            "https://"
  SubDomain:           "www"
  DomainWithTLD:       "stackoverflow.co.uk"
  Domain:              "stackoverflow"
  TopLevelDomain:      "co"
  CountryCode:         "uk"
  Path:                "/path/2"
  QString:             "?q=mysearch&and=more#stuff"
}

Allowed/Compliant Domains : Should ALL MATCH

www.bankofamerica.com
bankofamerica.com.securersite.regexr.com
bankofamerica.co.uk.blahblahblah.secure.com.it
dashes-bad-for-seo.but-technically-still-allowed.not-in-front-or-end
bit.ly
is.gd
foo.biz.pl
google.com.cn
stackoverflow.co.uk
level_three.sub_domain.example.com
www.thelongestdomainnameintheworldandthensomeandthensomemoreandmore.com
https://www.stackoverflow.co.uk?q=mysearch&and=more
foo://5th.4th.3rd.example.com:8042/over/there
foo://subdomain.example.com:8042/over/there?name=ferret#nose
example.com
www.example.com
example.co.uk
trailing-slash.com/
trailing-pound.com#
trailing-question.com?
probably-not-valid.com.cn?&#
probably-not-valid.com.cn/?&#
example.com/page
example.com?key=value

* NOTE: PunyCodes (Unicode in urls) handled just fine with \w ,no extra sauce needed
xn--fsqu00a.xn--0zwm56d.com
xn--diseolatinoamericano-66b.com

Emails : Should ALL MATCH

first.name@google1.co.com
foo@us.industries.com,
foobar@tm.valves.net,
andfoo@ge.test.com
jane.doe@my-bank.no
john.doe@spam.com
jane.ann.doe@sandnes.district.gov

Non-Compliant Domains : Should NOT MATCH

either not long-enough (domain min length 2), or too long (64)

v.gd
thing.y
0123456789012345678901234567890123456789012345678901234567891234.com
its-sixty-four-instead-of-sixty-three!.com
symbols-not-allowed@.com
symbols-not-allowed#.com
symbols-not-allowed$.com
symbols-not-allowed%.com
symbols-not-allowed^.com
symbols-not-allowed&.com
symbols-not-allowed*.com
symbols-not-allowed(.com
symbols-not-allowed).com
symbols-not-allowed+.com
symbols-not-allowed=.com

TBD Not handled:

* dashes as start or ending is disallowed (dropped from Regex for readability)
-junk-.com 
* is underscore allowed? i donno... (but it simplifies the regex using \w instead of [a-zA-Z0-9\-] everywhere)
symbols-not-allowed_.com

* special case localhost?
.localhost

also see:

Domain Name Rules :: Super handy ASCII Diagram of a URL

see: https://stackoverflow.com/a/66660651/738895 *

Side NOTE: lazy load '?' for subdomains{0,127}? currently needed for any of the cases with country codes... (example: stackoverflow.co.uk)
Matches these, but does NOT grab $NLevelSubdomains in a match group, can only grab 3rd level only.

score 0 · Answer 8 · answered Jan 16 '19 at 12:43

0

This is a relatively simple regex, and it grabs everything between the @ and the final domain extension (e.g. .com, .org). It allows domain names that are made up of non-word characters, which exist in real-world data.

>>> regex = re.compile(r"^.+@(.+)\.[\w]+$")

>>> regex.findall('jane.doe@my-bank.no')
['my-bank']

>>> regex.findall('john.doe@spam.com')
['spam']

>>> regex.findall('jane.ann.doe@sandnes.district.gov')
['sandnes.district']

answered Jan 16 '19 at 12:43

Renel Chesak

577
6
16

Unfortunately it doesn't work for emails like `yahoo.co.uk`. I just want `yahoo` in this case. – Jack Chi Sep 08 '21 at 01:56

punit choudhary · Answer 9 · 2020-07-25T16:19:15.543

0

I used this regular expression to get the complete domain name '.*@+(.*)' where .* will ignore all the character before @ (by @+) and start extracting cpmlete domain name by mentioning paranthesis and complete string inside(except linebrake characters)

edited Jul 25 '20 at 16:19

answered Jul 24 '20 at 07:42

punit choudhary

1
1

Why `@+`? `@` must be present **once** in email.Try your regex with `mymail@example.com blah blah` – Toto Jul 24 '20 at 10:41

score 0 · Answer 10 · answered Mar 29 '23 at 15:07

0

[^\@][a-zA-Z0-9$&+,;=?#|'<>.^*()%!-]+$ for the ones looking for something compatible with golang to extract domain name from email address with regex.

answered Mar 29 '23 at 15:07

Güney Saramalı

791
1
10
19