1

Most of my users have email addresses associated with their profile in /etc/passwd. They are always in the 5th field, which I can grab, but they appear at different places within a comma-separated list in the 5th field.

Can somebody give me a regex to grab just the email address (delimeted by commas) from a line in this file? (I will be using grep and sed from a bash script)

Sample lines from file:

user1:x:1147:5005:User One,Department,,,email@domain.org:/home/directory:/bin/bash
user2:x:1148:5002:User Two,Department2,email2@gmail.com,:/home/directory:/bin/bash
Arnab Nandy
  • 6,472
  • 5
  • 44
  • 50
Brent
  • 16,259
  • 12
  • 42
  • 42
  • 1
    [http://www.regular-expressions.info/email.html](http://www.regular-expressions.info/email.html) – Michael Myers Sep 18 '08 at 18:30
  • Does this answer your question? [How to validate an email address using a regular expression?](https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression) – miken32 Nov 26 '19 at 03:55

7 Answers7

6

What about:

,([^@]+@[^,:]+)

Where the group contains the email address.

[Updated based upon comment that address doesn't always get terminated by a comma]

Arnab Nandy
  • 6,472
  • 5
  • 44
  • 50
Ray Hayes
  • 14,896
  • 8
  • 53
  • 78
5

Actually, this looks like a perfect job for Awk. Now, like most people I will say "I'm no expert in Awk" before proceeding...

awk -F : '{print $5}' /etc/passwd

would get the 5th field where ':' is the field separator from /etc/passwd - it's probably the 5th field you are wanting.

awk -F , '{print $1}'

would get the 1st field from standard input where ',' was he delimimter so

awk -F : '{print $5}' /etc/passwd | awk -F , '{print $1}'

would get the first comma separated field (the Name field) from the fifth colon separated field (the field with all that kind of cruft in it!) in your /etc/passwd file.

Adjust the print $1 to get the field with your emails in it.

Doubtless there is away to do this without the pipe in Awk. I use Awk for splitting out fields in things and not much else. I find it confusing, and that's from somebody that loves regular expressions...

reefnet_alex
  • 9,703
  • 5
  • 33
  • 32
  • This will only work if the address is always in the same comma delimited field - which the question states, it is not. – Brent Sep 18 '08 at 18:47
  • This is true, I had seen different places but not interpreted it as in different comma delimited fields, but looking at the example it all becomes clear. My bad. – reefnet_alex Sep 18 '08 at 19:05
2
sed -r -e "s/^.*[,:]([^,:]+@[^,:]+).*$/\1/g" /etc/passwd

Will do the trick

Brent
  • 16,259
  • 12
  • 42
  • 42
1

Search for all email-valid-characters before and after the @ sign. Like:

[-A-z0-9.]+@[-A-z0-9.]+

Greedy matching should pull in everything it can, and it'll stop at the commas or colons.

Check which characters are valid in email addresses, though. I've left some out (like +)

Arnab Nandy
  • 6,472
  • 5
  • 44
  • 50
JBB
  • 4,543
  • 3
  • 24
  • 25
  • Probably easier to state what you don't want rather than try to work out what is valid. In this case he didn't want commas (if that is valid in an email address then I think he's out of luck for RegExpr). [^,]+ will do in this case. – Ray Hayes Sep 18 '08 at 18:33
  • Actually I put underscores in there. That's why the ]+@[-A-z0-9. is italicized. :) – JBB Sep 18 '08 at 18:33
  • Actually there are other characters besides '_' that are legal. See RFC 2821 and RFC 2822 for details. – Craig Trader Sep 18 '08 at 18:45
  • You can keep the right hand side (after the @) as [-A-Za-z0-9.]+ , since FQDNs can only consist legally of those characters. The left hand side has a much broader set of legal characters, per the RFCs. – nsayer Sep 18 '08 at 23:03
0
sed 's/,*:\/.*//;s/^.*://;s/.*,//' /etc/passwd
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
-1
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

should catch most emials

UnkwnTech
  • 88,102
  • 65
  • 184
  • 229
-1

How about the standard RFC 2822:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Yep. That's it. :)

Community
  • 1
  • 1
Marcio Aguiar
  • 14,231
  • 6
  • 39
  • 42
  • ... actually, a full implementation of that RFC is somewhat more... complex: http://ex-parrot.com/~pdw/Mail-RFC822-Address.html – dsm Sep 20 '08 at 13:12