0

emaillist.txt

1. Saman.desilva@tamucc.edu
2. saman_desilva@tamucc.edu
3. saman&desilva@tamucc.edu
4. Saman.desilva@gmail.com
5. saman@desilva@yahoo.com
6. saman@mail@com
7. saman.desilva@yahoo com

I want to print valid email addresses but am having trouble figuring this problem out. So far I have this script, but it doesn't print the fully correct output. It still gives me an incorrect output.

sed -nr '/\w+@\w+\.\w+$/p' emaillist.txt

The output:

saman.desilva@tamucc.edu 
saman_desilva@tamucc.edu
saman&desilva@tamucc.edu 
Saman.desilva@gmail.com
saman@desilva@yahoo.com
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
mrcf17
  • 11
  • 2
  • How thoroughly do you want to validate the emails? There's a fairly famous question, [How do I validate an email address using a regular expression?](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) which has example expressions that are exceedingly complex — and which I'd hate to try to convert to `sed` (even with 'extended regular expressions' enabled), though it might be able to handle it, in theory. – Jonathan Leffler Nov 27 '18 at 03:15
  • Is the output you show the output you expect, or the output you get? You should clarify which email addresses you want delivered as output from the input. One problem is that you're not anchoring the start of the email address with `^`, so `whoever.whom@wherever@somewhere.com` will be reported as OK because `whoever@somewhere.com` is valid (when not prefixed by `whoever.whom@`) but your regex doesn't protect against this. – Jonathan Leffler Nov 27 '18 at 03:25
  • Btw: This is a valid email address, too: `recipient@[1.2.3.4]` – Cyrus Nov 27 '18 at 05:46
  • 1
    Possible duplicate of [How to validate an email address using a regular expression?](https://stackoverflow.com/q/201323/608639), [Checking correctness of an email address with a regular expression in Bash](https://stackoverflow.com/q/2138701/608639), etc. – jww Nov 27 '18 at 08:58

1 Answers1

0

First of all, a regular expression that matches all valid email addresses is notoriously complex. I'm going to assume, given the test data, that you're aiming for a much simpler concept of email address validity.

One issue with your regex is that you aren't matching from the beginning of the line, which is signified with ^. This allows invalid emails like the one with an ampersand in the username to match because it just matches everything after the ampersand. So if we add the ^, we then get the following output:

$ sed -nr '/^\w+@\w+\.\w+$/p' emaillist.txt
saman_desilva@tamucc.edu

Well that's not right either, and now the problem is that \w only represents any letter, number or underscore. Periods are the other "valid" non-alphanumeric character for usernames in your test data, so we also need to tweak your pattern to add that, and now we get the correct output:

$ sed -nr '/^(\w|\.)+@\w+\.\w+$/p' emaillist.txt
Saman.desilva@tamucc.edu
saman_desilva@tamucc.edu
Saman.desilva@gmail.com
cody
  • 11,045
  • 3
  • 21
  • 36