0

I want to build a python script which greps trough exim logfiles on my smtp servers and reports a total sum of most send domains every day, week, and month. I'm pretty new to python and I'm struggling how to achieve this.

The relevant lines in an exim logfile typically looks like this:

Feb 24 00:00:23 smtp1.mail.net exim[5660]: 2014-02-24 00:00:23 1Wuniq-mail-idSo-Fg -> someuser@somedomain.com R=mail T=remote_smtp H=smtp.mail.net [000.00.34.17]

Feb 24 00:00:23 smtp1.mail.net exim[5660]: 2014-02-24 00:00:23 1Wuniq-mail-idSo-Fg -> someuser@somedomain.com R=mail T=remote_smtp H=smtp.mail.net [000.00.34.17]

Feb 24 00:00:23 smtp1.mail.net exim[5661]: 2014-02-24 00:00:23 1Wuniq-mail-idSm-1h => someuser@somedomain.com R=mail T=pop_mail_net H=mta.mail.net [000.00.34.6]

Feb 24 00:00:23 smtp1.mail.net exim[5661]: 2014-02-24 00:00:23 1Wuniq-mail-idSm-1h Completed

The sys, os and re modules should be enough to achieve this(?). I also want to use a dictionary because I want to run the script daily in cron.

The main problem is: How kan I grep trough only the relevant lines? Exim logs a lot, and I only want to grep the lines containing "=> and ->". In conjunction with the uniq maild ID with starts with "1W". Also, for the daily scriptrun the script must "tail" trough the logfiles, and should start at the last position where the script stopped parsing the logfiles the last time it was executed. In order to generate a reliable daily send domain count, this is necessary.

Help would be very much appreciated.

evuez
  • 3,257
  • 4
  • 29
  • 44
user.py
  • 89
  • 2
  • 5

1 Answers1

1

You can first read the file in reverse by:

    logFileData = []
    with open( pathToLogFile, "r" ) as logfile:
            logFileData = reversed( logfile.readlines() )

Then you can get the parts of each logfile data you want from:

    for line in logFileData:
            temp = ""
            if '=>' in line:
                    temp = line.split('=>')
            elif '->' in line:
                    temp = line.split('->')

            if temp:
                    ## Get the first substring after splitting at '->/=>'; 
                    ##   From that substring, get the last substring after splitting at all whitespaces.

                    address = temp[0].split()[-1].strip()
                    timestampParts = temp[0].split()[:3]  #Get each part of the date
                    timestampString = " ".join( timestampParts )

                    ##   Get the last two characters of interest from it.
                    if address.startswith("1W"):
                             pass  #Replace with your functionality here

For parsing time in string take a look at python dateutil package. Also see a relevant question answered.

I'm assuming this is a script, which would mean you can't 'remember' where you last left it. For this you could edit the log file itself when you read it, marking the position where you left off by a unique symbol. When you read it next, only read from the end of the file to your symbol.

Edit: corrected timestamp calculation

Community
  • 1
  • 1
Tejas Pendse
  • 551
  • 6
  • 19