I have written a python script that read the contents of two files, the first is a relatively small file (~30KB) and the second is a larger file ~270MB. The contents of both files are loaded into a dictionary data structure. When the second file is loaded I would have expected the amount of RAM required to be roughly equivalent to the size of the file on disk, perhaps with some overhead, but watching the RAM usage on my PC it seems to consistently take ~2GB (around 8 times the size of the file). The relevant source code is below (pauses inserted just so I can see the RAM usage at each stage). The line consuming large amounts of memory is "tweets = map(json.loads, tweet_file)":
def get_scores(term_file):
global scores
for line in term_file:
term, score = line.split("\t") #tab character
scores[term] = int(score)
def pause():
tmp = raw_input('press any key to continue: ')
def main():
# get terms and their scores..
print 'open word list file ...'
term_file = open(sys.argv[1])
pause()
print 'create dictionary from word list file ...'
get_scores(term_file)
pause()
print 'close word list file ...'
term_file.close
pause()
# get tweets from file...
print 'open tweets file ...'
tweet_file = open(sys.argv[2])
pause()
print 'create dictionary from word list file ...'
tweets = map(json.loads, tweet_file) #creates a list of dictionaries (one per tweet)
pause()
print 'close tweets file ...'
tweet_file.close
pause()
Does anyone know why this is? My concern is that I would like to extend my research to larger files, but will fast run out of memory. Interestingly, the memory usage does not seem to increase noticeably after opening the file (as I think this just creates a pointer).
I have an idea to try looping through the file one line at a time and processing what I can and only storing the minimum that I need for future reference rather than loading everything into a list of dictionaries, but I was just interested to see if the approx 8 times multiplier on file size to memory when creating a dictionary is in line with other peoples experience?