0

I have multi-lines of Emails and I need to do a couple of things:

stephen.marquard@uct.ac.za
louis@media.berkeley.edu
zqian@umich.edu
rjlowe@iupui.edu
zqian@umich.edu
rjlowe@iupui.edu 
... etc
  1. I need to put them in one list: ['stephen.marquard@uct.ac.za','louis@media.berkeley.edu','louis@media.berkeley.edu'..etc]
  2. need to figure out which Email is the most repetitive within that list, That's how I startred my code and I hope I could complete it from where I ended my code!

    fname = raw_input("Enter file name: ")
    if len(fname) < 1 : fname = "mbox-short.txt"
    fh = open(fname)
    lines = []
    count = 0 # For next step
    for line in fh:
        line = line.rstrip()
        if not line.startswith("From ") : continue
        x = line.split()
        emails = x[1]
     #print y
    
    maxapperence = 0 
    famous = None
    for mail in emails:
        count = emails.count(mail)
        if count > maxapperence:
            famous = mail
    print famous
    
    apparence = dict()
    for mail in set(emails):
        apparence[mail] = emails.count(mail)
    print apparence]
    

    out put :

    stephen.marquard@uct.ac.za
    louis@media.berkeley.edu
    zqian@umich.edu
    rjlowe@iupui.edu
    zqian@umich.edu
    rjlowe@iupui.edu
    cwen@iupui.edu
    cwen@iupui.edu
    gsilver@umich.edu
    gsilver@umich.edu
    zqian@umich.edu
    gsilver@umich.edu
    wagnermr@iupui.edu
    zqian@umich.edu
    antranig@caret.cam.ac.uk
    gopal.ramasammycook@gmail.com
    david.horwitz@uct.ac.za
    david.horwitz@uct.ac.za
    david.horwitz@uct.ac.za
    david.horwitz@uct.ac.za
    stephen.marquard@uct.ac.za
    louis@media.berkeley.edu
    louis@media.berkeley.edu
    ray@media.berkeley.edu
    cwen@iupui.edu
    cwen@iupui.edu
    cwen@iupui.edu
    
Khalida
  • 1
  • 1

2 Answers2

1

If you've got a file that only contains email addresses:

import collections
filename = ''
c = collections.Counter(map(str.strip, open(filename).readlines()))
print(c.most_common(10)) # dumb example of possible output format
chelmertz
  • 20,399
  • 5
  • 40
  • 46
  • Sorry maybe I should have said that from the beginning, but actually I've extracted those email addresses from a txt file! – Khalida Jul 09 '15 at 16:48
  • @Khalida If you already have email adresses in a list, you can replace `map(....)` in my example with your list: `l = ['a@b.c', 'd@e.f']; c = collections.Counter(l)` – chelmertz Jul 09 '15 at 16:54
  • No the emails where not in a list but just in plain text file. – Khalida Jul 09 '15 at 17:26
  • Then my answer should be good enough I think? Just set `filename` – chelmertz Jul 09 '15 at 18:04
  • I don't need to go back to file name, I've already extracted the emails and they print like above , Now I need to put them in one list and loop through the list to find the most in the list and how many times exist there? – Khalida Jul 11 '15 at 13:36
  • @Khalida: That code seems invalid, `emails` is a string that gets overwritten for each iteration, and in the second loop you treat it as a list. If it were a list, you could just `c = collections.Counter(emails)`. I'm sorry but I'm not sure I can help you any more if you don't get what I'm saying over and over again. – chelmertz Jul 13 '15 at 07:39
  • Never mind, I solved the problem by myself , Thanks to all . – Khalida Jul 14 '15 at 23:14
0

First example

emails = """stephen.marquard@uct.ac.za
louis@media.berkeley.edu
zqian@umich.edu
rjlowe@iupui.edu
zqian@umich.edu
rjlowe@iupui.edu
cwen@iupui.edu
cwen@iupui.edu
gsilver@umich.edu
gsilver@umich.edu
zqian@umich.edu
gsilver@umich.edu
wagnermr@iupui.edu
zqian@umich.edu
antranig@caret.cam.ac.uk
gopal.ramasammycook@gmail.com
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
stephen.marquard@uct.ac.za
louis@media.berkeley.edu
louis@media.berkeley.edu
ray@media.berkeley.edu
cwen@iupui.edu
cwen@iupui.edu
cwen@iupui.edu""".split("\n")

maxapperence = 0 
famous = None
for mail in set(emails):
    count = emails.count(mail)
    if count > maxapperence:
        famous = mail
        maxapperence = count
print famous, maxapperence

You can also store all mail apparence

apparence = dict()
for mail in set(emails):
    apparence[mail] = emails.count(mail)
print apparence
wilfriedroset
  • 217
  • 1
  • 8
  • I have saved the whole emails in 'y' – Khalida Jul 09 '15 at 17:20
  • Hi wilfriedrt, I'm getting very close, I just need to show the number of times that Email exist on that list , at the moment the out-put is showing : cwen@iupui.edu while it should be: cwen@iupui.edu 5 as the Email exist 5 times on that list, I tried to use the count = 0 within the same loop but it didn't loop through ! please help . – Khalida Jul 11 '15 at 12:29
  • You just have to use maxappereance since it's used to store the max number of apparence. – wilfriedroset Jul 13 '15 at 08:34