I have large log files that are in compressed format. ie largefile.gz these are commonly 4-7gigs each.
Here's the relevant part of the code:
for filename in os.listdir(path):
if not filename.startswith("."):
with open(b, 'a') as newfile, gzip.GzipFile(path+filename,'rb') as oldfile:
# BEGIN Reads each remaining line from the log into a list
data = oldfile.readlines()
for line in data:
parts = line.split()
after this the code will do some calculations (basically totaling up a the bytes) and will write to a file that says "total bytes for x critera = y". All this works fine in a small file. But on a large file it kills the system
What I think my program is doing is reading the whole file, storing it in data Correct me if i'm wrong but I think its trying to put the whole log into memory first.
Question: how I can read 1 line from the compressed file , process it then move on to the next without trying to store the whole thing in memory first? (or is it really already doing that.. I'm not sure but based on looking at the activity monitor my guess is that it is trying to go all in memory)
Thanks