Improving efficiency of search loop python

Question

I have written a code that reads a file, finds if a line has the word table_begin and then counts the number of lines until the line with the word table_end.

Here is my code -

for line in read_file:
    if "table_begin" in line:
        k=read_file.index(line)
    if 'table_end' in line:
        k1=read_file.index(line)
        break

count=k1-k
if count<10:
    q.write(file)

I have to run it on ~15K files so, since its a bit slow (~1 file/sec), I was wondering if I am doing something inefficient. I was not able to find myself, so any help would be great!

@AlokThakur sorry it was just a typo. Its the same file named `read_file`. made the change — Rakesh Tripathi, Feb 04 '16 at 05:29

score 8 · Accepted Answer · edited May 23 '17 at 12:17

8

When you do read_file.index(line), you are scanning through the entire list of lines, just to get the index of the line you're already on. This is likely what's slowing you down. Instead, use enumerate() to keep track of the line number as you go:

for i, line in enumerate(read_file):
    if "table_begin" in line:
        k = i
    if "table_end" in line:
        k1 = i
        break

edited May 23 '17 at 12:17

Community

1
1

answered Feb 04 '16 at 05:40

Claudiu

224,032
165
485
680

The OP may not be familiar with `enumerate` so you could add a word or link on that :). – The6thSense Feb 04 '16 at 05:43

A Small Shell Script · Answer 2 · 2016-02-04T06:12:19.630

You are always checking for both strings in the line. In addition, index is heavy as you're seeking the file, not the line. Using "in" or "find" will be quicker, as will only checking for table_begin until you've found it, and table_end after you've seen table_begin. If you aren't positive each file has table_begin and table_end in that order (and only one of each) you may need some tweaking/checks here (maybe pairing your begin/end into tuples?)

EDIT: Incorporated enumerate and switched from a while to a for loop, allowing some complexity to be removed.

def find_lines(filename):
    bookends = ["table_begin", "table_end"]
    lines = open(filename).readlines()
    for bookend in bookends:
        for ind, line in enumerate(lines):
            if bookend in line:
                yield ind
                break

for line in find_lines(r"myfile.txt"):
    print line
print "done"

score 1 · Answer 3 · answered Feb 04 '16 at 13:55

Clearly, you obtain read_file by f.readlines(), which is a bad idea, because you read the all file.

You can win a lot of time by :

reading file line by line :
searching one keyword at each time.

stopping after 10 lines.

with open('test.txt') as read_file:
    counter=0
    for line in read_file:
        if "table_begin" in line : break
    for line in read_file:
        counter+=1
        if "table_end" in line or counter>=10 : break # if  "begin" => "end" ...
    if counter < 10 : q.write(file)

Improving efficiency of search loop python

3 Answers3