3

During for line in f:, my code save the lines who contain a specific data. Unfortunately, I have to read the entire file to be sure than it is the most resent data. In a second time, I have to check the entire file (between 5000-8000 lines) until I got the correct line several time (for each data).

So, my question is, it is possible to open a file and go to a specific line, read it and do it again. I saw different answer about it, but I can't save all the file in a str because I don't have so much RAM on my device ... That's why I want to search directly in the file.

Rekoc
  • 438
  • 6
  • 17
  • if the lines have fixed size or previous lines don't change, yes, by computing/caching line position in the file, else it's impossible. You'd be better off with a database or a binary file. – Jean-François Fabre Oct 05 '17 at 09:56
  • When you use for line in f, the data is read using a generator. So the full file is never loaded into memory at once. – N M Oct 05 '17 at 10:02
  • Unfortunately, no @Jean-FrançoisFabre , the size hasn't fixed and can't be fixed. It was one of my first idea. – Rekoc Oct 05 '17 at 10:04
  • @NM, teah I know that and I want to keep this process. That's why I don't want to use `readlines()`. – Rekoc Oct 05 '17 at 10:07
  • yes, you could loop on the lines (not using `readlines`) and break when a counter is reached. – Jean-François Fabre Oct 05 '17 at 10:07
  • Would using a dictionary with the value being the file location (got by file.tell) and then, using file.seek to go to that line and using file.readline to read the line work? – N M Oct 05 '17 at 10:11
  • 1
    related (but not duplicate): https://stackoverflow.com/questions/620367/how-to-jump-to-a-particular-line-in-a-huge-text-file – Jean-François Fabre Oct 05 '17 at 10:17
  • Is it your requirement to find the single most recent instance of the "specific data" in the file? If so then you could search backwards from the end of the file for the target, which would be the most recent instance if the file is updated temporally. Is there only one target item, or are there several? – mhawke Oct 05 '17 at 10:41
  • I already tried @mhawke, most of the time, you right, the data than I'm looking for the most recent data at the of the file, but sometime it can be at the beginning or somewhere else :-/. And yes, there are several item. – Rekoc Oct 05 '17 at 11:23

3 Answers3

5

Do it with iterators and generators, files xreadlines (python 2) it is lazily evaluated so the file is not been loaded into memory until you consume it:

def drop_and_get(skiping, it):
    for _ in xrange(skiping):
        next(it)
    return next(it)
f = xrange(10000)#lets say your file is this generator
drop_and_get(500, iter(f))
500

So you can actualy do something like:

with open(yourfile, "r") as f:
    your_line = drop_and_get(5000, f.xreadlines())
    print your_line

You can actually even skip xreadlines since the file object is an iterator itself

with open(yourfile, "r") as f:
    your_line = drop_and_get(5000, f)
    print your_line
Netwave
  • 40,134
  • 6
  • 50
  • 93
  • @Jean-FrançoisFabre, yes, but will solve the ram problem, since should not be storing the strings in memory. – Netwave Oct 05 '17 at 10:08
  • you don't need to create an iterator from the file handle, it is already one. – Jean-François Fabre Oct 05 '17 at 10:11
  • @Jean-FrançoisFabre, yes, it was for making the example work, but i will change it. Thanks – Netwave Oct 05 '17 at 10:11
  • @Jean-FrançoisFabre, the `f` xrange object should be transformed into an iterator in order to consume it, give it a try :D – Netwave Oct 05 '17 at 10:13
  • 2
    yes, my bad, you're right. Else `next` cannot be called. But with a file handle it works. Fair enough. You could add python 3 compatible code to your example. – Jean-François Fabre Oct 05 '17 at 10:13
  • Thank you a lot for your rapid answer @DanielSanchez, I never use these iterator and generator, I check on the documentation and give a feedback about it ! – Rekoc Oct 05 '17 at 10:20
1

Daniel solution is very good. A simpler alternative is to loop on the file handle and break when the required line is reached. Then you can resume the loop to actually process those lines.

Note that there's no miracle unless the size of the lines don't change (in which case you could memorize file position and seek to it afterwards): you have to read all file data from the start. You just don't need to store it to memory using readlines(). Never use readlines()

Here's my naive approach, doesn't use generator or complex stuff, but is as efficient, and simple:

# skip first 5000 lines
for i,line in enumerate(f):
    if i == 5000:
       break

# process the rest of the file
for line in f:
    print(line.rstrip())
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
0

Below, you can find me code :

with open(leases_file,'r') as f:
    for line in f:
        # save the line numbers
    for l in list_ip.values(): # do it for each line saved
        f.seek(0) # go back from the beginning
        for i, line in enumerate(f): 
            # Looking for the good line
            if q == (l-1): # l contain the line number
                break
        for line in f:
            # read the data 

I tried again this morning, maybe it's because I do 'f.seek(0)' ? It's the only difference between my and you code.

Rekoc
  • 438
  • 6
  • 17