I want to build a python script which greps trough exim logfiles on my smtp servers and reports a total sum of most send domains every day, week, and month. I'm pretty new to python and I'm struggling how to achieve this.
The relevant lines in an exim logfile typically looks like this:
Feb 24 00:00:23 smtp1.mail.net exim[5660]: 2014-02-24 00:00:23 1Wuniq-mail-idSo-Fg -> someuser@somedomain.com R=mail T=remote_smtp H=smtp.mail.net [000.00.34.17]
Feb 24 00:00:23 smtp1.mail.net exim[5660]: 2014-02-24 00:00:23 1Wuniq-mail-idSo-Fg -> someuser@somedomain.com R=mail T=remote_smtp H=smtp.mail.net [000.00.34.17]
Feb 24 00:00:23 smtp1.mail.net exim[5661]: 2014-02-24 00:00:23 1Wuniq-mail-idSm-1h => someuser@somedomain.com R=mail T=pop_mail_net H=mta.mail.net [000.00.34.6]
Feb 24 00:00:23 smtp1.mail.net exim[5661]: 2014-02-24 00:00:23 1Wuniq-mail-idSm-1h Completed
The sys
, os
and re
modules should be enough to achieve this(?).
I also want to use a dictionary because I want to run the script daily in cron.
The main problem is: How kan I grep trough only the relevant lines? Exim logs a lot, and I only want to grep the lines containing "=> and ->". In conjunction with the uniq maild ID with starts with "1W". Also, for the daily scriptrun the script must "tail" trough the logfiles, and should start at the last position where the script stopped parsing the logfiles the last time it was executed. In order to generate a reliable daily send domain count, this is necessary.
Help would be very much appreciated.