I'd use something like:
REGEX = /\.(?:mil|gov)$/
%w[
jane.doe@navy.mil
barak.obama@whitehouse.gov
john.doe@usa.army.mil
family@example.com
].each do |addr|
puts '"%s" %s' % [addr, (addr[REGEX] ? 'matches' : "doesn't match")]
end
# >> "jane.doe@navy.mil" matches
# >> "barak.obama@whitehouse.gov" matches
# >> "john.doe@usa.army.mil" matches
# >> "family@example.com" doesn't match
If you know the TLD you want is always at the end of the string, then a simple pattern that matches just that is fine.
This works because addr[REGEX]
uses String's []
method which applies the pattern to the string and returns the match or nil:
'foo'[/oo/] # => "oo"
'bar'[/oo/] # => nil
If you want to capture everything before the TLD:
REGEX = /(.+)\.(?:mil|gov)$/
%w[
jane.doe@navy.mil
barak.obama@whitehouse.gov
john.doe@usa.army.mil
family@example.com
].map do |addr|
puts addr[REGEX, 1]
end
# >> jane.doe@navy
# >> barak.obama@whitehouse
# >> john.doe@usa.army
# >>
Using it in a more "production-worthy" style:
SELECT_PATTERN = '\.(?:mil|gov)$' # => "\\.(?:mil|gov)$"
CAPTURE_PATTERN = "(.+)#{ SELECT_PATTERN }" # => "(.+)\\.(?:mil|gov)$"
SELECT_REGEX, CAPTURE_REGEX = [SELECT_PATTERN, CAPTURE_PATTERN].map{ |s|
Regexp.new(s)
}
SELECT_REGEX # => /\.(?:mil|gov)$/
CAPTURE_REGEX # => /(.+)\.(?:mil|gov)$/
addrs = %w[
jane.doe@navy.mil
barak.obama@whitehouse.gov
john.doe@usa.army.mil
family@example.com
].select{ |addr|
addr[SELECT_REGEX]
}.map { |addr|
addr[CAPTURE_REGEX, 1]
}
puts addrs
# >> jane.doe@navy
# >> barak.obama@whitehouse
# >> john.doe@usa.army
Similarly, you could do it without a regular expression:
TLDs = %w[.mil .gov]
%w[
jane.doe@navy.mil
barak.obama@whitehouse.gov
john.doe@usa.army.mil
family@example.com
].each do |addr|
puts '"%s" %s' % [ addr, TLDs.any?{ |tld| addr.end_with?(tld) } ]
end
# >> "jane.doe@navy.mil" true
# >> "barak.obama@whitehouse.gov" true
# >> "john.doe@usa.army.mil" true
# >> "family@example.com" false
And:
TLDs = %w[.mil .gov]
addrs = %w[
jane.doe@navy.mil
barak.obama@whitehouse.gov
john.doe@usa.army.mil
family@example.com
].select{ |addr|
TLDs.any?{ |tld| addr.end_with?(tld) }
}.map { |addr|
addr.split('.')[0..-2].join('.')
}
puts addrs
# >> jane.doe@navy
# >> barak.obama@whitehouse
# >> john.doe@usa.army
end_with?
returns a true/false whether the string ends with that substring, which is faster than using the equivalent regular expression. any?
looks through the array looking for any matching condition and returns true/false.
If you have a long list of TLDs to check, using a well written regular expression can be very fast, possibly faster than using any?
. It all depends on your data and the number of TLDs to check so you'd need to run benchmarks against a sampling of your data to see which way to go.