opening specific lines inside multiple files

Question

I am trying to open specific lines of multiple files and return the lines of each file. My solution is taking quite time-consuming. do you have any suggestion?
func.filename: the name of the given file
func.start_line: the starting point in the given file
func.endline: finishing point in the given file

def method_open(func):
    try:
        body = open(func.filename).readlines()[func.start_line:
                                               func.end_line]
    except IOError:
        body = []
        stderr.write("\nCouldn't open the referenced method inside {0}".
                     format(func.filename))
        stderr.flush()
    return body

Have in mind that sometimes the opening file func.filename can be the same but unfortunately, this is not the case most of the time.

Why is it taking so much time? Is the file very large? If so then how large? — Muhammad Tahir, Apr 05 '16 at 14:30
You can try the [linecache](https://docs.python.org/2/library/linecache.html) module or `itertools.islice` and see if they're also too time consuming. See more details here: http://stackoverflow.com/questions/2081836/reading-specific-lines-only-python — Paulo Almeida, Apr 05 '16 at 14:47
no, I think the problem is that I am opening and closing the file over and over again, abut linecache can be interesting if it had the capability of muliple lines at the same time. — Mehrdad Mehraban, Apr 05 '16 at 15:37
I think I will have to time each solution to know which one is the fastest. — Mehrdad Mehraban, Apr 05 '16 at 15:43

score 2 · Accepted Answer · answered Apr 05 '16 at 15:42

The problem with readlines is that it reads the whole file into memory and linecache does the same.

You can save some time by reading one line at a time and breaking the loop as soon as you reach func.endline

But the best method i found is to use itertools.islice

Here the results of some tests I have done on a 130MB file of ~9701k lines:

--- 1.43700003624 seconds --- f_readlines
--- 1.00099992752 seconds --- f_enumerate
--- 1.1400001049 seconds --- f_linecache
--- 0.0 seconds --- f_itertools_islice

Here you can find the script I used

import time
import linecache
import itertools


def f_readlines(filename, start_line, endline):
    with open(filename) as f:
        f.readlines()[5000:10000]


def f_enumerate(filename, start_line, endline):
    result = []
    with open(filename) as f:
        for i, line in enumerate(f):
            if i in range(start_line, endline):
                result.append(line)
            if i > endline:
                break


def f_linecache(filename, start_line, endline):
    result = []
    for n in range(start_line, endline):
        result.append(linecache.getline(filename, n))


def f_itertools_islice(filename, start_line, endline):
    result = []
    with open(filename) as f:
        resultt = itertools.islice(f, start_line, endline)
        for i in resultt:
            result.append(i)


def runtest(func_to_test):
    filename = "testlongfile.txt"
    start_line = 5000
    endline = 10000
    start_time = time.time()
    func_to_test(filename, start_line, endline)
    print("--- %s seconds --- %s" % ((time.time() - start_time),func_to_test.__name__))

runtest(f_readlines)
runtest(f_enumerate)
runtest(f_linecache)
runtest(f_itertools_islice)

interesting! This actually answers my question on which method is the fastest unlike the mentioned question: http://stackoverflow.com/questions/2081836/reading-specific-lines-only-python Thank you! — Mehrdad Mehraban, Apr 06 '16 at 07:49

opening specific lines inside multiple files

1 Answers1