1

i'm new to python. I am attempting to write a quick and dirty python script to find certain strings log files and extract certain info from that line. The lines in the log file look like this

2012-08-01 13:36:40,449 [PDispatcher: ] ERROR  Fatal error DEF_CON encountered. Shutting down
2012-08-01 14:17:10,749 [PDispatcher: ] INFO  Package 1900034442 Queued for clearance.
2012-08-01 14:23:06,998 [PDispatcher: ] ERROR Exception occurred attempting to lookup prod id 90000142

I have a function where the input parameters will be a filename and an array of patterns to look for. Currently i can find all lines within the file that contains one or more of the the specified patterns (though not sure if its the most efficient way) and i'm able to extract the line number and line.

def searchLogs(fn, searchPatterns):
    res = []
    with open(fn) as f:
        for lineNo, line in enumerate(f, 1):
            #check if pattern strings exist in line
            for sPattern in searchPatterns:
                if sPattern in line:
                    fountItem = [fn, pattern, lineNo, line]
                    res.append(fountItem)
    return res

searchLogs("c:\temp\app.log", ["ERROR", "DEF_CON"]) #this should return 3 elements based on the above log snipped (2 for the first line and 1 for the third line)

What i would like to do also is to extract the date and time while searching. I was therefore thinking of modifying the search patterns to be a regular expression string with grouping that would search and extract the date. Only one problem, i'm not sure how to do this in python...any help would be appreciated.

Edit(Solution): With help from Sebastian and the link Joel provided, i've come up with this solution:

def search_logs(fn, searchPatterns):
    res = []
    with open(fn) as f:
        for lineNo, line in enumerate(f, 1):
            #check if pattern strings exist in line
            for sPattern in searchPatterns:
                #crude reg ex to match pattern and if matched, 'group' timestamp
                rex = r'^(.+) \[.*' + pattern 
                ms = re.match(rex, line)
                if ms:
                    time = ms.group(1)
                    item = Structs.MatchedItem(fn, pattern, lineNo, line, time)
                    res.append(item)
    return res

search_logs("c:\temp\app.log", ["ERROR", "DEF_CON"]) #this should return 3 elements based on the above log snipped (2 for the first line and 1 for the third line)
mike01010
  • 5,226
  • 6
  • 44
  • 77
  • 2
    http://docs.python.org/howto/regex.html – Joel Cornett Aug 02 '12 at 03:55
  • You should improve your question, questions lacking some research are considered rude in stackoverflow. – Paulo Scardine Aug 02 '12 at 03:59
  • my apology...this is very first piece of python code i've written and as i said, it is a quick and dirty script meant for short term solution to monitoring. – mike01010 Aug 02 '12 at 04:05
  • @Joel..thanks...i think that link does have some good examples that will help me – mike01010 Aug 02 '12 at 04:08
  • 2
    @mike01010: for the 1st python code it is a very good code. A nitpick: use [pep-8 naming conventions](http://www.python.org/dev/peps/pep-0008/#naming-conventions) and you could use `yield found_item` instead of `res.append(found_item)`, also `found_item` should be a [tuple (or namedtuple) instead of a list](http://stackoverflow.com/a/626871/4279). – jfs Aug 02 '12 at 04:08
  • @J.F.Sebastian thanks for the tips/advice...will make the recommended changes. – mike01010 Aug 02 '12 at 04:13
  • @mike01010 So what does "Structs.MatchedItem" do? – Antony Thomas Aug 02 '12 at 14:43

2 Answers2

1

There are two parts:

  • extract datetime string
  • parse it into a datetime object

For the later you could use datetime.strptime() function:

try:
    dt = datetime.strptime(line.split(" [", 1)[0], "%Y-%m-%d %H:%M:%S,%f")
except ValueError:
    dt = None

The former depends on how regular your log-files and how fast and robust you want the solution to be e.g., line.split(" [", 1)[0] is fast, but fragile. A more robust solution is:

' '.join(line.split(None, 2)[:2])

but it might be slower.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • in looking at the lin provided Joel, i think i may be able to do the search and 'strip' in one line using regular expressions. I haven't quite figured it out yet..but i think i can generate an expression that contains the date/time pattern + (pattern1 | patern2), i should be able to match on that, and with proper grouping, extract the date...i will try for a bit, if not, fall back on your suggestion. – mike01010 Aug 02 '12 at 04:20
  • 1
    @mike01010: It is not necessary to use regex here. `strptime()` does all validation you need. You could use my second suggestion (`' '.join(...)`) to extract datetime part: it always works for correct datetimes, the rest is handled by `strptime()`. – jfs Aug 03 '12 at 03:55
  • thanks again Sebastian. The strptime suggestion is really good to know and was helpful. – mike01010 Aug 06 '12 at 16:26
1

Here is your regular expression. I have tested the regular expression but not the full code

def searchLogs(fn, searchPatterns):
    res = []
    with open(fn) as f:
        for lineNo, line in enumerate(f, 1):
            #check if pattern strings exist in line
            for sPattern in searchPatterns:
                if sPattern in line:
                    date = re.search(r'(19|20)\d{2}-(0[1-9]|[12])-(0[1-9]|[12][0-9]|3[01])',line).group()
                    time = re.search(r'\b([01][0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9]),[0-9][0-9][0-9]',line).group()
                    fountItem = (fn, pattern, lineNo, date, time, line) # prefer a tuple over list
                    res.append(fountItem)
    return res

PS : REs are always a pain in the wrong place. Let me know if you need explanation. :)

Antony Thomas
  • 3,576
  • 2
  • 34
  • 40
  • Thanks anothony, i was able to come up with a less 'safe' solution based on previous response. i've edited my original post to provide that solution. – mike01010 Aug 02 '12 at 05:46