Go to a specific line in file

Question

During for line in f:, my code save the lines who contain a specific data. Unfortunately, I have to read the entire file to be sure than it is the most resent data. In a second time, I have to check the entire file (between 5000-8000 lines) until I got the correct line several time (for each data).

So, my question is, it is possible to open a file and go to a specific line, read it and do it again. I saw different answer about it, but I can't save all the file in a str because I don't have so much RAM on my device ... That's why I want to search directly in the file.

if the lines have fixed size or previous lines don't change, yes, by computing/caching line position in the file, else it's impossible. You'd be better off with a database or a binary file. — Jean-François Fabre, Oct 05 '17 at 09:56
When you use for line in f, the data is read using a generator. So the full file is never loaded into memory at once. — N M, Oct 05 '17 at 10:02
Unfortunately, no @Jean-FrançoisFabre , the size hasn't fixed and can't be fixed. It was one of my first idea. — Rekoc, Oct 05 '17 at 10:04
@NM, teah I know that and I want to keep this process. That's why I don't want to use `readlines()`. — Rekoc, Oct 05 '17 at 10:07
yes, you could loop on the lines (not using `readlines`) and break when a counter is reached. — Jean-François Fabre, Oct 05 '17 at 10:07
Would using a dictionary with the value being the file location (got by file.tell) and then, using file.seek to go to that line and using file.readline to read the line work? — N M, Oct 05 '17 at 10:11
related (but not duplicate): https://stackoverflow.com/questions/620367/how-to-jump-to-a-particular-line-in-a-huge-text-file — Jean-François Fabre, Oct 05 '17 at 10:17
Is it your requirement to find the single most recent instance of the "specific data" in the file? If so then you could search backwards from the end of the file for the target, which would be the most recent instance if the file is updated temporally. Is there only one target item, or are there several? — mhawke, Oct 05 '17 at 10:41
I already tried @mhawke, most of the time, you right, the data than I'm looking for the most recent data at the of the file, but sometime it can be at the beginning or somewhere else :-/. And yes, there are several item. — Rekoc, Oct 05 '17 at 11:23

Netwave · Accepted Answer · 2017-10-05T10:12:13.087

5

Do it with iterators and generators, files xreadlines (python 2) it is lazily evaluated so the file is not been loaded into memory until you consume it:

def drop_and_get(skiping, it):
    for _ in xrange(skiping):
        next(it)
    return next(it)
f = xrange(10000)#lets say your file is this generator
drop_and_get(500, iter(f))
500

So you can actualy do something like:

with open(yourfile, "r") as f:
    your_line = drop_and_get(5000, f.xreadlines())
    print your_line

You can actually even skip xreadlines since the file object is an iterator itself

with open(yourfile, "r") as f:
    your_line = drop_and_get(5000, f)
    print your_line

edited Oct 05 '17 at 10:12

answered Oct 05 '17 at 10:05

Netwave

40,134
6
50
93

@Jean-FrançoisFabre, yes, but will solve the ram problem, since should not be storing the strings in memory. – Netwave Oct 05 '17 at 10:08
you don't need to create an iterator from the file handle, it is already one. – Jean-François Fabre Oct 05 '17 at 10:11
@Jean-FrançoisFabre, yes, it was for making the example work, but i will change it. Thanks – Netwave Oct 05 '17 at 10:11
@Jean-FrançoisFabre, the `f` xrange object should be transformed into an iterator in order to consume it, give it a try :D – Netwave Oct 05 '17 at 10:13
2

yes, my bad, you're right. Else `next` cannot be called. But with a file handle it works. Fair enough. You could add python 3 compatible code to your example. – Jean-François Fabre Oct 05 '17 at 10:13
Thank you a lot for your rapid answer @DanielSanchez, I never use these iterator and generator, I check on the documentation and give a feedback about it ! – Rekoc Oct 05 '17 at 10:20

score 1 · Answer 2 · answered Oct 05 '17 at 10:25

1

Daniel solution is very good. A simpler alternative is to loop on the file handle and break when the required line is reached. Then you can resume the loop to actually process those lines.

Note that there's no miracle unless the size of the lines don't change (in which case you could memorize file position and seek to it afterwards): you have to read all file data from the start. You just don't need to store it to memory using readlines(). Never use readlines()

Here's my naive approach, doesn't use generator or complex stuff, but is as efficient, and simple:

# skip first 5000 lines
for i,line in enumerate(f):
    if i == 5000:
       break

# process the rest of the file
for line in f:
    print(line.rstrip())

answered Oct 05 '17 at 10:25

Jean-François Fabre

137,073
23
153
219

Your solution don't work, my code reopen the file from the beginning after the first for loop (with enumerate(f)). I certainly forgot something. – Rekoc Oct 06 '17 at 09:53
yeah try the print loop you'll see line 5000. – Jean-François Fabre Oct 06 '17 at 11:55
I did and I saw the line 5000 but the second for loop start at the beginning of the file and not line 5000 like I expect. – Rekoc Oct 06 '17 at 15:22
I just tested with a small file and it works. Of course, don't reopen the file in between reassigning `f`. Both loops must be consecutive. – Jean-François Fabre Oct 06 '17 at 15:25

score 0 · Answer 3 · answered Oct 09 '17 at 08:44

Below, you can find me code :

with open(leases_file,'r') as f:
    for line in f:
        # save the line numbers
    for l in list_ip.values(): # do it for each line saved
        f.seek(0) # go back from the beginning
        for i, line in enumerate(f): 
            # Looking for the good line
            if q == (l-1): # l contain the line number
                break
        for line in f:
            # read the data

I tried again this morning, maybe it's because I do 'f.seek(0)' ? It's the only difference between my and you code.

Go to a specific line in file

3 Answers3