-1

I am trying to grab values from a log file using Python's regular expression and in that process I have a few if statements. The section of the code which grabs the values is as follows:

# Opening the log file for reading
with open(logFile, 'r') as logfile_read:
for line in logfile_read:
    line = line.rstrip()

# To extract Time or iteration
    if 'Time' in line:
        iteration_time = re.findall(r'^Time\s+=\s+(.*)', line)

# To extract local, global and cumulative values

    if 'local' in line:
        local_global_cumu = re.search(r'sum\s+local\s+=\s+(.*),\s+global\s+=\s+(.*),\s+cumulative\s+=\s+(.*)', line)
        if local_global_cumu:
            contLocal_0_value = local_global_cumu.group(1)
            contGlobal_0_value = local_global_cumu.group(2)
            contCumulative_0_value = local_global_cumu.group(3)
        for t in iteration_time:
            contLocal.write("%s\t%s\n" %(t, contLocal_0_value))
            contGlobal.write("%s\t%s\n" %(t, contGlobal_0_value))
            contCumulative.write("%s\t%s\n" %(t, contCumulative_0_value))

    # To extract execution and cpu time

    if 'ExecutionTime' in line:
        execution_cpu_time = re.search(r'^ExecutionTime\s+=\s+(.*)\s+s\s+ClockTime\s+=\s+(.*)\s+s', line)
        if execution_cpu_time:
           execution_time_0_value = execution_cpu_time.group(1)
           cpu_time_0_value = execution_cpu_time.group(2)
        for t in iteration_time:
            print t

In the second if statement, I am able to get values of t. However, in the subsequent if statement, when I try to print t, nothing comes. I am not sure where I have gone wrong.

hypersonics
  • 1,064
  • 2
  • 10
  • 23
  • 2
    If is not a loop. [And you're using if incorrectly.](http://stackoverflow.com/questions/20002503/why-does-a-b-or-c-or-d-always-evaluate-to-true) – Ashwini Chaudhary Jan 20 '15 at 02:25
  • I'd guess that `iteration_time` is empty, or the body of `if ('ExecutionTime' or 'ClockTime') in line:` is never executed. – Aran-Fey Jan 20 '15 at 02:27
  • Thanks @Ashwini Chaudhary. Can you give me some hints please? – hypersonics Jan 20 '15 at 02:28
  • `if ('ExecutionTime' or 'ClockTime') in line:` is working correctly. I tested this by printing both `execution_time_0_value` and `cpu_time_0_value` and they are successful. Thanks @Rawing – hypersonics Jan 20 '15 at 02:30
  • Another thing I note is, if I inset `print iteration_time` immediately before the second `if` statement I get all the values, however if the same is done immediately after the `if` statement, I get empty list of values. – hypersonics Jan 20 '15 at 02:37
  • Thanks @JonClements. I have now modified my `if` statements to only look for one keyword, yet it does not print the values of `iteration_time` beyond the second `if` statement. I have also edited my original post to reflect this change. – hypersonics Jan 20 '15 at 02:56
  • `iteration_time` only gets set when your first `if` block runs... I'm surprised you're not getting a `NameError`... - so your `for t in iteration_time` will be r eferring to the time that last got set by your first `if`... – Jon Clements Jan 20 '15 at 02:58
  • I am not getting any error @Jon Clements. If you look at the beginning of my code, all these `if` are getting evaluated within the `for` loop. If I not wrong, for every line it strips, the value of `t` is obtained and the same is carried for the next `if` statement. Hence, I am getting all values of `t`, except after `if 'ExecutionTime' in line:` – hypersonics Jan 20 '15 at 03:10
  • What I'm pointing out is that `iteration_time` has the potential to be carried across iterations of the for-loop, eg: it gets set on the first line, but never reset again, so all following iterations use whatever the last value was - that might well be desired behaviour, but we don't know that. I can't see anything wrong with your code other than that, so I'm afraid it's up to you to put some `print`s in or run it through a debugger – Jon Clements Jan 20 '15 at 03:14
  • If `Time` is in line, but doesn't match your regex, then your result will be `[]`... however, at the point `local` is in a line and `ExecutionTime` are in a line, they may not be the same `iteration_time`, as it only gets (re)set when `Time` is in the line - in fact "ExecutionTime" - contains "Time" - so your regex will trigger, not match, return `[]`, then your "ExecutionTime" block runs with an empty `iteration_time`... that's it - your line can't both start with "Time" and "ExecutionTime" :) – Jon Clements Jan 20 '15 at 03:16

1 Answers1

1

The following checks if "Time" is a substring in the line, then attempts to find all matches on that line that begins with "Time"...

if 'Time' in line:
    iteration_time = re.findall(r'^Time\s+=\s+(.*)', line)

The following also contains the word "Time":

if 'ExecutionTime' in line:
    execution_cpu_time = re.search(r'^ExecutionTime\s+=\s+(.*)\s+s\s+ClockTime\s+=\s+(.*)\s+s', line)

When it attempts to loop over iteration_time it will be empty as the previous if has already run and the condition that it starts with "Time" means you get an empty list for its matches.

Let's just pretend you have a single line, starting with "ExecutionTime", and let's walk through it...

  • if 'Time' in line is true, so the re.findall runs and returns all matches for the line that starts with 'Time'... This will be empty because the line doesn't start with 'Time' - so iteration_time = []
  • if 'ExecutionTime' in line is true, and the line does start with 'ExecutionTime', when you do the for t in iteration_time - it won't loop, because the above has set it to be empty!
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • Thanks @Jon Clements. In my log file `Time` and `ExecutionTime` are in two different line and also the syntx for regex uses `^` which only checks for line starting with. Both my `Time` and `ExecutionTime` is at the start of two different lines. – hypersonics Jan 20 '15 at 03:32
  • Yes, but "Time" is in "ExecutionTime" which means that first `if` actually runs... the findall will return nothing because the line **doesn't** start with "Time" it starts with "ExecutionTime"... get it? So when the `if 'ExecutionTime'` block runs, `iteration_time` will be emptied by the first if... – Jon Clements Jan 20 '15 at 03:34
  • Thanks @JonClements. However, I am still a bit confused. Look at my typical log file here: http://stackoverflow.com/questions/28017121/extracting-multiple-strings-using-pythonss-regular-expression/28017210?noredirect=1#comment44422473_28017210 – hypersonics Jan 20 '15 at 03:50
  • @Deepak it doesn't matter how your data looks, it's just logic... I've tried to spell it out with an update to the answer - if you don't get that, there's not much more I can do I'm afraid :) – Jon Clements Jan 20 '15 at 04:01
  • @Deepak anyway - probably a quick fix is to change your line from `if 'Time' in line` to be `if line.startswith('Time')` – Jon Clements Jan 21 '15 at 10:15