6

I have a long log file generated with log4j, 10 threads writing to log. I am looking for log analyzer tool that could find lines where user waited for a long time (i.e where the difference between log entries for the same thread is more than a minute).

P.S I am trying to use OtrosLogViewer, but it gives filtering by certain values (for example, by thread ID), and does not compare between lines.

PPS the new version of OtrosLogViewer has a "Delta" column that calculates the difference between adj log lines (in ms)

thank you

lili
  • 1,866
  • 8
  • 29
  • 50
  • 1
    This sounds extremely specific. Have you considered writing some code to solve this yourself? Alternatively, can we help with the root issue that's causing you to examine the logs in this fashion? – Duncan Jones Aug 30 '12 at 18:09
  • I thought its a common need. It helps to find the points when users were waiting for a long time – lili Aug 30 '12 at 21:16
  • OtrosLogViewer can filter/highlight log lines. You need to compare two distinct lines, so no log viewer will help you – Raffaele Sep 07 '12 at 21:40

3 Answers3

3

This simple Python script may be enough. For testing, I analized my local Apache log, which BTW uses the Common Log Format so you may even reuse it as-is. I simply compute the difference between two subsequent requests, and print the request line for deltas exceeding a certain threshold (1 second in my test). You may want to encapsulate the code in a function which also accepts a parameter with the thread ID, so you can filter further

#!/usr/bin/env python
import re
from datetime import datetime

THRESHOLD = 1

last = None
for line in open("/var/log/apache2/access.log"):
    # You may insert here something like
    # if not re.match(THREAD_ID, line):
    #   continue
    # Python does not support %z, hence the [:-6]
    current = datetime.strptime(
        re.search(r"\[([^]]+)]", line).group(1)[:-6],
        "%d/%b/%Y:%H:%M:%S")
    if last != None and (current - last).seconds > THRESHOLD:
        print re.search('"([^"]+)"', line).group(1)
    last = current
Raffaele
  • 20,627
  • 6
  • 47
  • 86
2

Based on @Raffaele answer, I made some fixes to work on any log file (skipping lines that doesn't begin with the requested date, e.g. Jenkins console log). In addition, added Max / Min Threshold to filter out lines base on duration limits.

#!/usr/bin/env python
import re
from datetime import datetime

MIN_THRESHOLD = 80
MAX_THRESHOLD = 100

regCompile = r"\w+\s+(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d).*"
filePath = "C:/Users/user/Desktop/temp/jenkins.log"

lastTime = None
lastLine = ""

with open(filePath, 'r') as f:
    for line in f:   
        regexp = re.search(regCompile, line)
        if regexp:
            currentTime = datetime.strptime(re.search(regCompile, line).group(1), "%Y-%m-%d %H:%M:%S")

            if lastTime != None:
                duration = (currentTime - lastTime).seconds
                if duration >= MIN_THRESHOLD and duration <= MAX_THRESHOLD:
                    print ("#######################################################################################################################################")
                    print (lastLine)
                    print (line)
            lastTime = currentTime
            lastLine = line
f.closed
Noam Manos
  • 15,216
  • 3
  • 86
  • 85
0

Apache Chainsaw has a time delta column.

enter image description here

weberjn
  • 1,840
  • 20
  • 24