1

I've received a list of emails that I'd like to run an email campaign on, however, in the list there are some URL's... and it complicates things.

Here's the standard formatting of the email address, for example:

news@ydr.com

I'd like to paste the list in terminal and run a command to ONLY capture all of the email addresses and save them to a file and remove any URLS.

Please advise! It is much appreciated :)

Apane101
  • 1,121
  • 1
  • 14
  • 37
  • Can you post an example of an URL? Are they just domain names or do they start with a protocol? What about the emails? Just simple examples or do you need to be able to extract all possible edge cases? – spickermann Apr 14 '18 at 05:41
  • Might be something like this, editors@informationweek.com https://www.techdirt.com/submitstory.php in which I'd only want the editors@informationweek.com email.. and remove the URL – Apane101 Apr 14 '18 at 16:48

1 Answers1

1

If you are just looking to catch most emails this regex might work. I got this regex from here How to validate an email address using a regular expression? They talk about the much more complicated RFC822 email regex

 #!/usr/bin/env ruby

input = $stdin.readlines # ctrl + D after paste
input.each do |f|
  puts f if f[/^[a-zA-Z0-9_.+\-]+@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-.]+$/]
end

# test input
# foo@bar.com
# www.cnn.com
# test.email@go.com
# turdburgler@mcdo.net
# http://www.google.com

To write emails to a file:

 #!/usr/bin/env ruby

file = File.open("emails.txt", "w")
input = $stdin.readlines # ctrl + D after paste
input.each do |f|
  file.write(f) if f[/^[a-zA-Z0-9_.+\-]+@[a-zA-Z0-9\-]+\.[a-zA-Z0-9\-.]+$/]
end
file.close

Just to be clear, this is a ruby script which should be ran like this. Save the script as a file, ie email_parser.rb.

chmod +x email_parser.rb
./email_parser.rb # this will wait for stdin, here you paste the list in to the terminal

When the terminal is hanging waiting, paste the list of emails in, then press ctrl + D to tell the program that this is the EOF. The program will then run through the list of emails/urls and parse. The output of this will be a file if using the updated script. The file will be in the same folder you ran the script and be called emails.txt

earlonrails
  • 4,966
  • 3
  • 32
  • 47
  • Unfortunately, still see url's.. anyway we can run the command and only have email addresses and save the email addresses to a .txt file? – Apane101 Apr 14 '18 at 00:58