My question is similar to this but slightly different. I am trying to read through a file, looking for lines containing emails starting with 'From' and then creating a dictionary to store this emails, but also giving out the maximum occurring email address.
The line to be looked for in the files is this :
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Any time this is found, the email part should be extracted out and then placed in a list before creating the dictionary.
I came upon this code sample for printing the maximum key,value in a dict:
counts = dict()
names = ['csev','owen','csev','zqian','cwen']
for name in names:
counts[name] = counts.get(name,0) + 1
maximum = max(counts, key = counts.get)
print maximum, counts[maximum]
From this sample code I then tried with this program:
import re
name = raw_input("Enter file:")
if len(name) < 1 : name = "mbox-short.txt"
handle = open(name)
matches = []
addy = []
counts = dict()
for lines in handle :
# look for specific characters in document text
if not lines.startswith("From ") : continue
# increment the count variable for each math found
lines.split()
# append the required lines to the matches list
matches.append(lines)
# loop through the list to acess each line individually
for email in matches :
# place values in variable
out = email
# looking through each line for any email add found
found = re.findall(r'[\w\.-]+@[\w\.-]+', out)
# loop through the found emails and print them out
for i in found :
i.split()
addy.append(i)
for i in addy:
counts[i] = counts.get(i, 0) + 1
maximum = max(counts, key=counts.get)
print counts
print maximum, counts[maximum]
Now the issue is that there are only 27 lines starting with from and the highest recurring email in that list should be 'cwen@iupui.edu' which occurs 5 times but when i run the code my output becomes this
{'gopal.ramasammycook@gmail.com': 1640, 'louis@media.berkeley.edu': 7207, 'cwen@
iupui.edu': 8888, 'antranig@caret.cam.ac.uk': 1911, 'rjlowe@iupui.edu': 10678, '
gsilver@umich.edu': 10140, 'david.horwitz@uct.ac.za': 4205, 'wagnermr@iupui.edu'
: 2500, 'zqian@umich.edu': 16804, 'stephen.marquard@uct.ac.za': 7490, 'ray@media
.berkeley.edu': 168}
Here's the link to the text file for better understanding : text file