1

Experts, I am trying to count the E-maill address and number of their repitions in the maillog file which somehow i am able to make using Regular expression (re.search) OR (re.match) but i am looking this to be accomplished with (re.findall) which currently i am dabbling with.. would appreciate any suggestions..

1) Code Line ...

# cat maillcount31.py
#!/usr/bin/python
import re
#count = 0
mydic = {}
counts = mydic
fmt = " %-32s %-15s"
log =  open('kkmail', 'r')

for line in log.readlines():
        myre = re.search('.*from=<(.*)>,\ssize', line)
        if myre:
           name = myre.group(1)
           if name not in mydic.keys():
              mydic[name] = 0
           mydic[name] +=1

for key in counts:
   print  fmt % (key, counts[key])

2) Output from the Current code..

# python maillcount31.py
 root@MyServer1.myinc.com         13
 User01@MyServer1.myinc.com       14
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53
  • You did not say exactly where you are having problems, so without analysing your code, I'd say you want to have two things: 1. a function which iterates through all appearances of email addresses, and 2. a `collections.Counter` which counts them. – zvone Dec 23 '15 at 20:29

3 Answers3

2

Hope this help...

from collections import Counter
emails = re.findall('.*from=<(.*)>,\ssize', line)# Modify re according to your file pattern  OR line pattern. If findall() on each line, each returned list should be combined.
result = Counter(emails)# type is <class 'collections.Counter'>
dict(result)#convert to regular dict

re.findall() will return a list. Looking into How can I count the occurrences of a list item in Python?, there are other ways to count the words in the returned list.

By the way, interesting functions of Counter:

>>> tmp1 = Counter(re.findall('from=<([^\s]*)>', "from=<usr1@gmail.com>, from=<usr2@gmail.com>, from=<usr1@gmail.com>, from=<usr1@gmail.com>, from=<usr1@gmail.com>,") )
>>> tmp1
Counter({'usr1@gmail.com': 4, 'usr2@gmail.com': 1})
>>> tmp2 = Counter(re.findall('from=<([^\s]*)>', "from=<usr2@gmail.com>, from=<usr3@gmail.com>, from=<usr1@gmail.com>, from=<usr1@gmail.com>, from=<usr1@gmail.com>,") )
>>> dict(tmp1+tmp2)
{'usr2@gmail.com': 2, 'usr1@gmail.com': 7, 'usr3@gmail.com': 1}

So, if the file is very large, we can count each line and combine them by aid of Counter.

Community
  • 1
  • 1
BAE
  • 8,550
  • 22
  • 88
  • 171
  • @ Pei - Cool explanation , Though my posted code is working, I am just looking how to fit the `(re.findall)` for that code what i posted. – Karn Kumar Dec 23 '15 at 22:31
  • @pygo http://stackoverflow.com/questions/8110059/python-regex-search-and-findall – BAE Dec 23 '15 at 22:38
  • http://stackoverflow.com/questions/452104/is-it-worth-using-pythons-re-compile – BAE Dec 23 '15 at 22:43
1

Have you considered using pandas, It can give you a nice table of results without the need for regex commands.

 import pandas as pd

 emails = pd.Series(email_list)
 individual_emails = emails.unique()

 tally = pd.DataFrame( [individual_emails , [0]*len(individual_emails)] )
 #makes a table with emails and a zeroed talley

 for item in individual_emails.index:
      address = tally.iloc[item,0]
      sum = len(email[email==address])

      tally.iloc[item,1] = sum


 print tally
user2589273
  • 2,379
  • 20
  • 29
  • @ user2589273 - i dont have pandas module installed. Though thanks for the help, will try that later once i have pandas available. – Karn Kumar Dec 23 '15 at 20:55
1

I hope the code at the bottom helps.

However, here are three things to generally note:

  1. Use (with) when opening files
  2. When iterating over dictionaries, use iteritems()
  3. When working with containers, collections are your best friend

#!/usr/bin/python
import re
from collections import Counter 

fmt = " %-32s %-15s"
filename = 'kkmail'

# Extract the email addresses
email_list = []
with open(filename, 'r') as log:
   for line in log.readlines():
      _re = re.search('.*from=<(.*)>,\ssize', line)
         if _re:
            name = _re.group(1)
            email_list.append(name)

# Count the email addresses
counts = dict(Counter(email_list)) # List to dict of counts: {'a':3, 'b':7,...}
for key, val in counts.iteritems():
   print  fmt % (key, val)
Hill
  • 71
  • 5
  • @ Hill - Thanks for the code .. this is working but i am looking with `re.findall()` function using `re.compile` so, we can compile the regex once and not to loop it over agaian on the file. Though My posted code is working fiine. – Karn Kumar Dec 23 '15 at 22:22