I'm trying to "map" a very large ascii file. Basically I read lines until I find a certain tag and then I want to know the position of that tag so that I can seek to it again later to pull out the associated data.
from itertools import dropwhile
with open(datafile) as fin:
ifin = dropwhile(lambda x:not x.startswith('Foo'), fin)
header = next(ifin)
position = fin.tell()
Now this tell
doesn't give me the right position. This question has been asked in various forms before. The reason is presumably because python is buffering the file object. So, python is telling me where it's file-pointer is, not where my file pointer is. I don't want to turn off this buffering ... The performance here is important. However, it would be nice to know if there is a way to determine how many bytes python chooses to buffer. In my actual application, as long as I'm close the the lines which start with Foo
, it doesn't matter. I can drop a few lines here and there. So, what I'm actually planning on doing is something like:
position = fin.tell() - buffer_size(fin)
Is there any way to go about finding the buffer size?