1

There is a solution for searching backward inline in Python Reverse Find in String:

s.rfind('I', 0, index)

But if I need to search for a string in several lines above that line? Say I have found the keyword by using:

with open("file.txt") as f
    searchlines = f.readlines()

for i, line in enumerate(searchlines):
    if "keyword" in line: 
    do_something()

I want do_something() is to find another keyword backward. To apply the code above, I think I need to f.read() so that I can make the file as a string. But this is totally nut, since I have to readlines() and read() the (large) file. I need to use readlines() because the first keyword may appears several times in the text, and I need to find them all.

Is there any better way to do this?

image description

@engineer
- kỹ sư
@engineering
- kỹ thuật
- civil e. ngành xây dựng
- communication e. kỹ thuật thông tin
- control e. kỹ thuật [điều chỉnh, điều khiển] (tự động)
- development e. nghiên cứu những kết cấu mới
Community
  • 1
  • 1
Ooker
  • 1,969
  • 4
  • 28
  • 58
  • could you explain the pic little bit? And it would be nice if those are characters rather than an image. – Avinash Raj Aug 30 '15 at 14:01
  • Which part you don't understand? Doesn't my question explain the pic already? I don't know if you need the lines from the pic or not. It is there just for illustration. But if you insist... – Ooker Aug 30 '15 at 14:04
  • https://regex101.com/r/iL2fT6/1 – Avinash Raj Aug 30 '15 at 14:07
  • This question could be useful: [How to read from a file in python starting from the end](http://stackoverflow.com/questions/3568833/how-to-read-lines-from-a-file-in-python-starting-from-the-end) – Mel Aug 30 '15 at 15:12

1 Answers1

4

I'd rather approach this this way: since you want to find the line starting with @, I'd rather store all the lines in a list, then discard the previous lines if a new line that starts with @ is found.

Thus we get:

def do_something(lines):
    print("I've got:")
    print(''.join(lines))

lines = []

with open("file.txt") as f:
    for i, line in enumerate(f):
        if line.startswith('@'):
            lines = []

        lines.append(line)
        if 'development' in line:
            do_something(lines)

The output with file.txt as you have, will be:

I've got:
@engineering
- kỹ thuật
- civil e. ngành xây dựng
- communication e. kỹ thuật thông tin
- control e. kỹ thuật [điều chỉnh, điều khiển] (tự động)
- development e. nghiên cứu những kết cấu mới

In general case if you want to have just N last seen lines, you can use a collections.deque instead of a list:

from collections import deque
N = 100
last_lines = deque(maxlen=N)

with open("file.txt") as f:
    for i, line in enumerate(f):
        last_lines.append(line)
        if 'development' in line:
            do_something(last_lines)

Now the do_something will be passed up to 100 last lines including the current line, if the current line contains the word development.

  • Simply brilliant. Instead of searching backward, you just remember all the text between two keywords and reset the "memory" if it is not the right one. – Ooker Aug 30 '15 at 18:52