-3

I have a code that reads all the lines from multiple log files inside a directory using a given regex pattern:

Here is the code:

src_dict = ("/nfs/home/dex/work/xxx/xxx/logs")
pattern = re.compile ('(.*)for exports(.*)')

for passed_files in os.listdir(src_dict):
    files = os.path.join(src_dict, passed_files)
    strng = open(files)
    for lines in strng.readlines():
        if re.search(pattern, lines):
            print lines

The above code gives me all the required lines from the log file with a time stamp at the end of each line.

./xx.xx.xx.v1.0_Final:2019-01-30 08:34:46.463 -0800 INFO [626] - Program Ended: xx::xx::xxx::xx for exports [... stuff ...] after 00:26:15

.....................and so on.

Now I want to fetch all the last lines that contains the timestamp i.e. 00:26:15 (it may always be something different) and calculate the total and average time for all the timestamps collected from these log files.

user24343
  • 892
  • 10
  • 19
MfjoneZ
  • 43
  • 9
  • 1
    Please edit your question and post your code with correct indentation. If it's not possible to edit your own question, close and ask again. – Tobias Brösamle Feb 05 '19 at 14:48
  • sorry for the unclear code, have edited and posted the code – MfjoneZ Feb 05 '19 at 16:02
  • This question is really unclear. You want to only find the lines with the timestamp 00:26:15, does that value vary? Also, what would be the point of an average if they all have the same timestamp? Is the log file full of lines like the one you posted? – I Funball Feb 05 '19 at 16:21
  • Why don't you just add it to the pattern you're already sorting by? – user24343 Feb 05 '19 at 16:30
  • the pattern i searched for gives me the list of all the timestamps, that was just an example given above which contains one line of the log....and yes the timestamps vary on each line as the above code parses all the services log file and provide only the passed time's......i need the avg time to calculate how long does it take for these set of services to run....hope you guys are clear now... – MfjoneZ Feb 06 '19 at 08:18
  • @user24343 what regular exp should i use in the existing pattern to get the desired output? – MfjoneZ Feb 11 '19 at 10:02

1 Answers1

0

Just add the timestamp1-getting to the regular expression you're using anyway.

For that, use "capture groups".

To get the numbers as format HH:MM:SS, you have two digits, a colon, two digits, another colon and another two digits. Quantifying that is hard, so you'll probably just write it exactly like that:

(\d\d):(\d\d):(\d\d) (if you want, you can {2} those \d, but to me it seems more complicated).

Note the parenthesis around the digit matchers: they tell re to capture the contents separately to you and make it available as match.group(number), starting at 1 and match.groups(), a tuple. To make sure to match the end of the line (not in the middle), you add $ (strictly, this shouldn't be neccessary, as .* id "greedy", but it's clearer).

If you add this to your regex, (removing the groups you already have if you don't need them; accounting for them otherwise), you get:

pattern = re.compile(r'.* for exports .* (\d\d):(\d\d):(\d\d)$')

Now, you can match that and get the time for every run like this: match = pattern.match(logline); seconds = (int(match.group(1))*60 + int(match.group(2))) * 60 + int(match.group(3)


I said above quantifying is hard, but it can be done, I'm aware of a 2-step approach: You first get all the timestamp, and then process it separately. In this case, the added complication is too much, but, if you have something different, it might be good to keep in mind:

.*((?:\d{2}:)+\d\d)$ captures you any length of two digits separated by colons, you can then simple .split(':') it and perform your calculations.


1 it usually refers to a specific point in time, not a duration. As to not confuse you, I used "timestamp" in my answer. The actual timestamp in your log output is 2019-01-30 08:34:46.463, not 00:26:15.

user24343
  • 892
  • 10
  • 19