1

Possible Duplicate:
Python: How to read huge text file into memory

To process a large text file(1G+) line by line , random access by any line number is desired, most importantly, without loading the whole file content into RAM. Is there a python library to do that?

It is beneficial when analyzing a large log file, read only is enough.

If there is no such standard library, I have to seek an alternative method: Find a set of function/class that can return the N-th line of sub-string from a big string-like object, so that I can mmap(yes, I mean memory-mapped file object) the file to that object then do line-based processing.

Thank you.

PS: A log file is almost sure to have variable line length.

Community
  • 1
  • 1
Jimm Chen
  • 3,411
  • 3
  • 35
  • 59

1 Answers1

1

I think that something like below might work, since the file object's method readline() reads one line at a time. If the lines are of arbitrary length, you need to index the positions like follows.

lines = [0]
with open("testmat.txt") as f:
    while f.readline():
        lines.append(f.tell())
    # now you can read an arbitrary line:
    f.seek(lines[1235])
    line = f.readline()

If the lines were of same length, you could just do f.seek(linenumber*linelenght)

Kimvais
  • 38,306
  • 16
  • 108
  • 142