Well, there's rarely a magical cure for everything based on regex. The regex are a great tool
for simple string parsing, but it shall not replace good old programming!
So if you look at your data, you'll notice that it always start with a date, and ends with the
-- End of monthly output --
line. So a nice way to handle that would be to split your data
by each monthly output.
Let's start with your data:
>>> s = """\
... Mon Feb 1 09:12:41 GMT 2016
...
... Rotating fax log files:
...
... Doing login accounting:
... total 688.31
... example 401.12
... _mbsetupuser 287.10
... root 0.05
... admin 0.04
...
... -- End of monthly output --
...
... Tue Feb 16 14:27:21 GMT 2016
...
... Rotating fax log files:
...
... Doing login accounting:
... total 0.00
...
... -- End of monthly output --
...
... Thu Mar 3 09:37:31 GMT 2016
...
... Rotating fax log files:
...
... Doing login accounting:
... total 377.92
... example 377.92
...
... -- End of monthly output --"""
And let's split it based ont that end of month line:
>>> reports = s.split('-- End of monthly output --')
>>> reports
['Mon Feb 1 09:12:41 GMT 2016\n\nRotating fax log files:\n\nDoing login accounting:\n total 688.31\n example 401.12\n _mbsetupuser 287.10\n root 0.05\n admin 0.04\n\n', '\n\nTue Feb 16 14:27:21 GMT 2016\n\nRotating fax log files:\n\nDoing login accounting:\n total 0.00\n\n', '\n\nThu Mar 3 09:37:31 GMT 2016\n\nRotating fax log files:\n\nDoing login accounting:\n total 377.92\n example 377.92\n\n', '']
Then you can separate the accounting data from the rest of the log:
>>> report = reports[0]
>>> head, tail = report.split('Doing login accounting:')
Now let's extract the date line:
>>> date_line = head.strip().split('\n')[0]
And fill up a dict with those username/totals pairs:
>>> accounting = dict(zip(tail.split()[::2], tail.split()[1::2]))
the trick here is to use zip()
to create pairs out of iterators on tail
. The "left"
side of the pair being an iterator starting at index 0, iterating every 2 items, the ~right~
side of the pair being an iterator starting at index 1, iterating every 2 items. Which makes:
{'admin': '0.04', 'root': '0.05', 'total': '688.31', '_mbsetupuser': '287.10', 'example': '401.12'}
So now that's done, you can do that in a for loop:
import datetime
def parse_monthly_log(log_path='/var/log/monthly.out'):
with open(log_path, 'r') as log:
reports = log.read().strip('\n ').split('-- End of monthly output --')
for report in filter(lambda it: it, reports):
head, tail = report.split('Doing login accounting:')
date_line = head.strip().split('\n')[0]
accounting = dict(zip(tail.split()[::2], tail.split()[1::2]))
yield {
'date': datetime.datetime.strptime(date_line.replace(' ', ' 0'), '%a %b %d %H:%M:%S %Z %Y'),
'accounting': accounting
}
>>> import pprint
>>> pprint.pprint(list(parse_monthly_log()), indent=2)
[ { 'accounting': { '_mbsetupuser': '287.10',
'admin': '0.04',
'example': '401.12',
'root': '0.05',
'total': '688.31'},
'date': datetime.datetime(2016, 2, 1, 9, 12, 41)},
{ 'accounting': { 'total': '0.00'},
'date': datetime.datetime(2016, 2, 16, 14, 27, 21)},
{ 'accounting': { 'example': '377.92', 'total': '377.92'},
'date': datetime.datetime(2016, 3, 3, 9, 37, 31)}]
And there you go with a pythonic solution without a single regex.
N.B.: I had to do a little trick with the datetime, because the log contains a day number filled with space and not zero (as expects strptime
), I used string .replace()
to change a double space into a 0
within the date string
N.B.: the filter()
and the split()
used in the for report…
loop is used to remove leading and trailing empty reports, depending on how the log file starts or ends.