0

I have the following regex and function to extract emails to an array and while it's working it seems less than optimal to me. Any suggestion to how I might approve this?

@emails = []
matches = @text_document.scan(/\+'(\S+@\S+|\{(?:\w+, *)+\w+\}@[\w.-]+)'/i)
matches.each {|m| m[0].split(',').each {|email| @emails << email  }  }

Specifically I am seeking something better than nested each'es.

Cheers

EDIT To be completely fair since I liked both answers I gave them both a fair run but since concat is slightly faster and shorter I will mark that as the answer.

require 'benchmark'

CONSTANT = 1
BenchTimes = 1_000_000
EMAILS = "+'one.emaili@domain.com,another.email@domain.se'"

def email
end

def bm_concat
  emails = []
  EMAILS.scan(/\+'(\S+@\S+|\{(?:\w+, *)+\w+\}@[\w.-]+)'/i) do |matches|
    matches.each {|m| emails.concat(m.split(','))}
  end

end

def bm_inject
  emails = []
  EMAILS.scan(/\+'(\S+@\S+|\{(?:\w+, *)+\w+\}@[\w.-]+)'/i) do |matches|
    matches.inject([]) {|arr, mails| emails.concat(mails.split(',')) }
  end

end

Benchmark.bmbm do |bm|
  bm.report("inject:") { BenchTimes.times { bm_inject } }
  bm.report("concat:") { BenchTimes.times { bm_concat } }
end

Yields the following output:

Rehearsal -------------------------------------------
inject:  11.030000   0.060000  11.090000 ( 11.145898)
concat:   9.660000   0.050000   9.710000 (  9.761068)
--------------------------------- total: 20.800000sec

              user     system      total        real
inject:  11.620000   0.060000  11.680000 ( 11.795601)
concat:  10.510000   0.050000  10.560000 ( 10.678999)
mhenrixon
  • 6,179
  • 4
  • 40
  • 64
  • 3
    [Don't do it with Regex](http://stackoverflow.com/questions/201323/using-a-regular-expression-to-validate-an-email-address). – Polynomial Jul 31 '12 at 14:09
  • @Polynomial: That linked e-mail regex is *scary*. I don't want to be that library's maintainer. – Linuxios Jul 31 '12 at 14:18
  • 1
    Agreed. Find a library, or just hope you don't get addresses like `hGy∂@olé.museum`. – Linuxios Jul 31 '12 at 14:22
  • I am not worried about the regex at all it's the `matches.each(|m| m[0].split(',').each {}` I want to improve. I couldn't care less if the email is valid or not. – mhenrixon Jul 31 '12 at 15:23
  • @mhenrixon: I changed my answer to answer your refined question. – Linuxios Jul 31 '12 at 23:01

2 Answers2

1

You can refactor the matches.each to this:

matches.each {|m| @emails.concat(m[0].split(','))}
Linuxios
  • 34,849
  • 13
  • 91
  • 116
  • I am not worried about the regex at all it's the matches.each(|m| m[0].split(',').each {} I want to improve. I couldn't care less if the email is valid or not. – mhenrixon Jul 31 '12 at 15:23
  • 1
    This implements the RFC822/RFC2822 email specification. It also marks many as "valid" that are rejected by some mail systems because they're crazy. *Theoretically* you can have spaces in your email address, but I've never seen this used in practice even once. – tadman Jul 31 '12 at 15:57
  • @mhenrixon: Getting rid of the validation and giving you a solution. – Linuxios Jul 31 '12 at 23:01
  • HAH! Now I got 2 great answers and have problems deciding which one I like better. Thanks for the refined answer :) – mhenrixon Aug 01 '12 at 07:17
  • Great not only short but also well performing. I changed the scan to send the matches in the block to get rid of the other array. – mhenrixon Aug 01 '12 at 09:21
1

Use inject - http://ruby-doc.org/core-1.9.3/Enumerable.html#method-i-inject

@emails = matches.inject([]) do |emails, input| 
  emails += input.first.split(',')
end

fyi, the variables passed to the block, emails refers to the empty array passed in, and input refers to each element of matches as you iterate over it.

Edit (How to use inject):

REGEX = /\+'(\S+@\S+|\{(?:\w+, *)+\w+\}@[\w.-]+)'/i
def bm_inject
  emails = EMAILS.scan(REGEX).inject([]) do |arr, mails| 
    arr.concat mails.first.split(',')
  end
end
AJcodez
  • 31,780
  • 20
  • 84
  • 118
  • of course you can do it in one line with braces too for the block – AJcodez Jul 31 '12 at 16:27
  • Thanks for the tip! I gave it a fair run but since @Linuxios solution is slightly faster That will be marked as answer. – mhenrixon Aug 01 '12 at 09:18
  • 1
    good call on the benchmark. Id urge you to revisit the documentation for inject and see my edit so you understand it for the future, cause you're using it wrong! – AJcodez Aug 01 '12 at 21:11
  • Thanks a bunch for that. I promise to learn inject properly! I've been meaning to for a long time but this is the first time it all makes sense to me so again thanks for the edited example. I see where I went wrong. – mhenrixon Aug 02 '12 at 05:21