I'm having some trouble dealing with large text files (about 1GB), when I want to read them and use them in while loops.
More specifically: First I start by doing some parsing on the lines of the file, in order to find e.g. all lines that start with "x". In doing so, I add the indices of the found lines to a list (say l
). This is the pre-processing part.
Now in a while loop, I'm choosing random indices from l
, and want to read its corresponding line (or say 5 lines around it). Thus I need to keep the file in memory once and for all throughout the while loop, as a priori I do not know what lines I end up reading (the line is randomly picked from l).
The problem is, when I call the file before my main loop, during the first run of the loop, the reading gets done successfully, but already from the second run, the file has vanished from memory. What I have tried:
The preprocess part:
for i, line in enumerate(filename):
prep = ''.join(c for c in line if c.isalnum() or c.isspace())
if 'x' in prep: l.append(i)
Now I have my l list. loading the file in memory before main loop:
with open(filename,'r') as f:
while (some condition):
random_index = random.sample(range(0,len(l)),1)
output_file = open("out","w") #I will write here the read line(s)
for i, line in enumerate(f):
#(the lines to be read, starting from the given random index)
if (i >= l[random_index]) and (i < l[random_index+1]):
out.write(line)
out.close()
Only during the first run of the loop, things work properly. Alternatively I also tried:
f = open(filename)
while (some condition):
random_index = ... #rest is same as above.
Same issue, only first run work. One thing that worked was putting the f=open(filename)
in the loop, so every run the file is called. But since it is a large one, this is really no practical solution.
- What am I doing wrong here?
- How should such readings be done properly?