Faster way of getting files with certain text in them in Python?

Question

I have lots and lots of files in a list called files, which I am looping through and saving all the files which have //StackOverflow on the very first line. It might have some additional text after it, but the line should begin with such text.

Currently I am doing it simply like so:

matches = []
for file in files:
    with open(file, "r") as inf:
        line = inf.readline()
        if line.strip().startswith("//StackOverflow"):
            matches.append([line] + inf.readlines())

However, I was wondering if there's a better (faster?) way of doing this, since now I have to open every single file one by one and always read the first line.

How else would you know that the files start with `//StackOverflow` unless you open them and read the first line? — Tim Pietzcker, Dec 25 '12 at 10:15
I know I have to open them anyways, I was just wondering if there's a FASTER or better way to do this :) — , Dec 25 '12 at 12:08

score 2 · Accepted Answer · answered Dec 25 '12 at 10:19

2

You will have to open all the files if you need to look at their contents. What you have is already pretty much the best you can do in Python.

In theory, you could read only the first 15 bytes of the file and check if they are equal to //StackOverflow, but I doubt that that will change much.

with open(file) as inf:
    if inf.read(15) == "//StackOverflow":
        inf.seek(0)
        matches.append(inf.readlines())

answered Dec 25 '12 at 10:19

Tim Pietzcker

328,213
58
503
561

I guess there's no better way then, this'll do... :P – Dec 25 '12 at 12:09

score 1 · Answer 2 · answered Dec 25 '12 at 10:16

1

If you are using Linux, you might consider using built-in tools, like find, head and grep. They are written in C/C++ and are much faster.

answered Dec 25 '12 at 10:16

warvariuc

57,116
41
173
227

It is not clear whether IO-bound task will be any faster in any language. [Unoptimized c++ code can be slower than Python](http://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-much-slower-in-c-than-python) – jfs Dec 25 '12 at 12:26

Faster way of getting files with certain text in them in Python?

2 Answers2