0

I would like to read a very large file from a line which has a sepecfic word, what is the best way to do that ?

lets say it is a file with 50K lines

43511
24622
53213
43534
57656
12121

I want to start reading lines of this file from the line that has 43534, what would be the most efficient way for a large file?

Medya Gh
  • 4,563
  • 5
  • 23
  • 35
  • [python: how to jump to a particular line in a huge text file?](http://stackoverflow.com/questions/620367/python-how-to-jump-to-a-particular-line-in-a-huge-text-file) – Sam Nicholls Jul 12 '13 at 16:31
  • that link is for going to a specific "line number", but this is for line that has a "specific word in it" – Medya Gh Jul 12 '13 at 16:32
  • Do you know the line number? Do all lines have an identical number of characters? – tommy.carstensen Apr 19 '14 at 14:15

3 Answers3

3

You could use itertools.dropwhile

t = '''43511
24622
53213
43534
57656
12121
'''


from StringIO import StringIO
import os
from itertools import dropwhile
from contextlib import closing

with closing(StringIO(t)) as f:
    for x in dropwhile(lambda x: x != '43534' + os.linesep, f):
            print x
iruvar
  • 22,736
  • 7
  • 53
  • 82
1

One way to do it manually without heavily exploding the memory could be something like this:

f = open('file.txt','r')
found = False
for line in f
    if line == '43534':
        found = True
    if found:
        # you now reached the line in the file and
        # therefore you can begin process it here
        # in case you need the position of the buffer
        # you do: f.tell()

Hope this helps!

Paulo Bu
  • 29,294
  • 6
  • 74
  • 73
  • I don't think you are iterating over the file correctly with `for line in f.readline():` . `f.readline()` will return a string, so iterating over the string will produce single characters. You're not iterating over the lines in the file, you're iterating over the characters in the first line. Anyway, we posted nearly identical solutions at basically the same time :p – David Marx Jul 12 '13 at 16:41
1

Just create a binary variable to represent whether or not you've read in that particular target string you are looking for. When you reach the string, flip the flag, triggering your script to read the rest of the file.

test = '43534'
past_test = False
with open(fname,'r') as f:
    for line in f:
        if past_test:
            # do stuff                
        elif line == test:
            past_test = True
David Marx
  • 8,172
  • 3
  • 45
  • 66