2

I have a function that generates a random email address:

def emails
    names = ["alfred", "daniel", "elisa", "ana", "ramzes"]
    surnames = ["oak", "leaf", "grass", "fruit"]
    providers = ["gmail", "yahoo", "outlook", "icloud"]
    address = "#{names.sample}.#{surnames.sample}#{rand(100..5300)}@#{providers.sample}.com"
end

Given a list of randomly generated email address:

email_list = 100.times.map { emails }

that looks like this:

daniel.oak3985@icloud.com
ramzes.grass1166@icloud.com
daniel.fruit992@yahoo.com
...

how can I select the most common provider ("gmail", "yahoo", etc.)?

Wayne Conrad
  • 103,207
  • 26
  • 155
  • 191
Ewelina
  • 33
  • 7

2 Answers2

2

Your question is similar to this one. There's a twist though : you don't want to analyze the frequency of email addresses, but their providers.

def random_email
  names = ["alfred", "daniel", "elisa", "ana", "ramzes"]
  surnames = ["oak", "leaf", "grass", "fruit"]
  providers = ["gmail", "yahoo", "outlook", "icloud"]
  address = "#{names.sample}.#{surnames.sample}#{rand(100..5300)}@#{providers.sample}.com"
end

emails = Array.new(100){ random_email }

freq = emails.each_with_object(Hash.new(0)) do |email,freq|
  provider = email.split('@').last
  freq[provider] += 1
end

p freq
#=> {"outlook.com"=>24, "yahoo.com"=>28, "gmail.com"=>32, "icloud.com"=>16}

p freq.max_by{|provider, count| count}.first
#=> "gmail.com"
Community
  • 1
  • 1
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
0
email_list = 10.times.map { emails }
  #=> ["alfred.grass426@gmail.com", "elisa.oak239@icloud.com",
  #    "daniel.fruit1600@outlook.com", "ana.fruit3761@icloud.com",
  #    "daniel.grass742@yahoo.com", "elisa.oak3891@outlook.com",
  #    "alfred.leaf1321@gmail.com", "alfred.grass5295@outlook.com",
  #    "ramzes.fruit435@gmail.com", "ana.fruit4233@yahoo.com"] 

email_list.group_by { |s| s[/@\K.+/] }.max_by { |_,v| v.size }.first
  #=> "gmail.com"

\K in the regex means disregard everything matched so far. Alternatively, @\K could be replaced by the positive lookbehind (?<=@).

The steps are as follows.

h = email_list.group_by { |s| s[/@\K.+/] }
  #=> {"gmail.com"  =>["alfred.grass426@gmail.com", "alfred.leaf1321@gmail.com",
  #                    "ramzes.fruit435@gmail.com"],
  #    "icloud.com" =>["elisa.oak239@icloud.com", "ana.fruit3761@icloud.com"],
  #    "outlook.com"=>["daniel.fruit1600@outlook.com",  "elisa.oak3891@outlook.com",
  #                    "alfred.grass5295@outlook.com"],
  #    "yahoo.com"  =>["daniel.grass742@yahoo.com", "ana.fruit4233@yahoo.com"]}
a = h.max_by { |_,v| v.size }
  #=> ["gmail.com", ["alfred.grass426@gmail.com", "alfred.leaf1321@gmail.com",
  #                  "ramzes.fruit435@gmail.com"]] 
a.first
  #=> "gmail.com" 

If, as here, there is a tie for most frequent, modify the code as follows to get all winners.

h = email_list.group_by { |s| s[/@\K.+/] }
  # (same as above)
mx_size = h.map { |_,v| v.size }.max
  #=> 3 
h.select { |_,v| v.size == mx_size }.keys
  #=> ["gmail.com", "outlook.com"] 
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100