Fastest way to read in very long file starting on arbitary line X

Question

I have a text file which is written to by a python program, and then read in by another program for display on a web browser. Currently it is read in by JavaScript, but I will probably move this functionality to python, and have the results passed into javascript using an ajax Request.

The file is irregularly updated every now and then, sometimes appending one line, sometimes as many as ten. I then need to get an updated copy of the file to javascript for display in the web browser. The file may grow to as large as 100,000 lines. New data is always added to the end of the file.

As it is currently written, javascript checks the length of the file once per second, and if the file is longer than it was last time it was read in, it reads it in again, starting from the beginning, this quickly becomes unwieldy for files of 10,000+ lines. Doubly so since the program may sometimes need to update the file every single second.

What is the fastest/most efficient way to get the data displayed to the front end in javascript?

I am thinking I could:

Keep track of how many lines the file was before, and only read in from that point in the file next time.
Have one program pass the data directly to the other without it reading an intermediate file (although the file must still be written to as a permanent log for later access)

Are there specific benefits/problems with each of these approaches? How would I best implement them?

For Approach #1, I would rather not do file.next() 15,000 times in a for loop to get to where I want to start reading the file, is there a better way?

For Approach #2, Since I need to write to the file no matter what, am I saving much processing time by not reading it too?

Perhaps there are other approaches I have not considered?

Summary: The program needs to display in a web browser data from python that is constantly being updated and may grow as long as 100k lines. Since I am checking for updates every 1 second, It needs to be efficient, just in case it has to do a lot of updates in a row.

You might be interested in [How can I tail a log file in Python?](http://stackoverflow.com/questions/12523044/how-can-i-tail-a-log-file-in-python) — Marc J, Apr 08 '16 at 23:55

score 1 · Answer 1 · edited May 23 '17 at 12:15

Opening a large file and reading the last part is simple and quick: Open the file, seek to a suitable point near the end, read from there. But you need to know what you want to read. You can easily do it if you know how many bytes you want to read and display, so keeping track of the previous file size will work well without keeping the file open.

If you have recorded the previous size (in bytes), read the new content like this.

fp = open("logfile.txt", "rb")
fp.seek(old_size, 0)     
new_content = fp.read()  # Read everything past the current point

On Python 3, this will read bytes which must be converted to str. If the file's encoding is latin1, it would go like this:

new_content = str(new_content, encoding="latin1")
print(new_content)

You should then update old_size and save the value in persistent storage for the next round. You don't say how you record context, so I won't suggest a way.

If you can keep the file open continuously in a server process, go ahead and do it the tail -f way, as in the question that @MarcJ linked to.

score 1 · Answer 2 · edited Jun 20 '20 at 09:12

The function you seek is seek. From the docs:

f.seek(offset, from_what)

The position is computed from adding offset to a reference point; the reference point is selected by the from_what argument. A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point.

Limitation for Python 3:

In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour.

Note that seeking to a specific line is tricky, since lines can be variable length. Instead, take note of the current position in the file (f.tell()), and seek back to that.

Fastest way to read in very long file starting on arbitary line X

2 Answers2