I'm trying to read big files(~10GB) of text data and put each string into a list.
corpus = []
for file in files:
fc = []
with open(file) as source:
# Use Multiprocessing to read all lines and add them to the list
filewords = pool.map(addline, source)
#Concatenate each sublist in filewords to one list with all stringwords
filewords = list(itertools.chain(*filewords))
corpus.append(filewords)
#do something with list
function(corpus)
What should I do to make this more memory efficient? With generators maybe? (I have no experience with them)