0

Supposed i have to read a file (quite big, about 20.000 lines). I have to loop through the lines and look for a keyword e.g. STACKOVERFLOW. Once the keyword is found, i know that i will have to process the next 10 lines.

Currently i am doing like:

with open(filepath) as f:
    for line_idx, line in enumerate(f):
        if re.match(my_keyword, line):
            # do something here from line_idx to line_idx + 9
            # can i jump directly to line_idx + 10 ???

Is there a way to skip the process (loop + search) for the next 10 lines when the keyword is found and continue to loop and search at e.g. line_index + 10 further?

Thank you!

UPDATE

I would like to add that what i wanted is a way which i don't have to temporarily save the file into a list. With this method i had a solution of myself already.

scmg
  • 1,904
  • 1
  • 15
  • 24

2 Answers2

0

You can just use a normal for loop instead of a for-each loop:

with open(filepath) as f:
    lines = f.readlines()
    for i in range(len(lines)):
        if re.match(my_keyword, lines[i]):
            # do something
            i += 10

It will use more memory than what you're doing currently, though, because you're reading the entire file into memory at once. Something to keep in mind.

Alternatively, if reading the entire file into memory is a problem, you could hack something together:

with open(filepath) as f:
    skip = 0
    for line in f:
        if skip <= 0:
            if re.match(my_keyword, line):
                skip = 10
        else:
            skip -= 1
            print(line) # The next ten lines after a match can be processed here
JGut
  • 532
  • 3
  • 13
  • with your alternative, you still have to read the entire file into the `lines` list, right? so what is the difference? – scmg Jul 12 '18 at 11:10
  • @scmg In the first method, you read the entire file into the variable `lines`. In the second method, you iterate line-by-line over the file, so you only have one line in memory at any given time. See https://stackoverflow.com/questions/6475328/read-large-text-files-in-python-line-by-line-without-loading-it-in-to-memory – JGut Jul 12 '18 at 11:15
  • so what is the `lines[i]` in your command `if re.match(my_keyword, lines[i])`? and how am i supposed to process the next 10 lines if the keyword is found at a line? e.g. print them out – scmg Jul 12 '18 at 11:24
  • @scmg Oops, you're right. That was left over from copy-pasting. I edited my answer. As for printing the other lines, that can be done in the `else` block. – JGut Jul 12 '18 at 11:26
  • can you please provide an example? i'm not sure if it works in the `else` block ... – scmg Jul 12 '18 at 11:36
  • @scmg Sure, added to my answer. If I'm understanding what you're asking, it shouldn't be too difficult – JGut Jul 12 '18 at 11:52
0

//Possible solution can be

f = open(filepath,"r")
lines = f.readlines()
count = -1
req_lines = []
for line in lines:
    count += 1
    if re.match(my_keyword, line):
        for i in range(10):
            count += 1
            req_lines.append(lines[count])

// now the lines you require are in variable named "req_lines" and you can preform any operation on them you want.