1

I have lots and lots of files in a list called files, which I am looping through and saving all the files which have //StackOverflow on the very first line. It might have some additional text after it, but the line should begin with such text.

Currently I am doing it simply like so:

matches = []
for file in files:
    with open(file, "r") as inf:
        line = inf.readline()
        if line.strip().startswith("//StackOverflow"):
            matches.append([line] + inf.readlines())

However, I was wondering if there's a better (faster?) way of doing this, since now I have to open every single file one by one and always read the first line.

  • 1
    How else would you know that the files start with `//StackOverflow` unless you open them and read the first line? – Tim Pietzcker Dec 25 '12 at 10:15
  • I know I have to open them anyways, I was just wondering if there's a FASTER or better way to do this :) –  Dec 25 '12 at 12:08

2 Answers2

2

You will have to open all the files if you need to look at their contents. What you have is already pretty much the best you can do in Python.

In theory, you could read only the first 15 bytes of the file and check if they are equal to //StackOverflow, but I doubt that that will change much.

with open(file) as inf:
    if inf.read(15) == "//StackOverflow":
        inf.seek(0)
        matches.append(inf.readlines())
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
1

If you are using Linux, you might consider using built-in tools, like find, head and grep. They are written in C/C++ and are much faster.

warvariuc
  • 57,116
  • 41
  • 173
  • 227
  • It is not clear whether IO-bound task will be any faster in any language. [Unoptimized c++ code can be slower than Python](http://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-much-slower-in-c-than-python) – jfs Dec 25 '12 at 12:26