I have a script that takes all the csv files in a directory and merges them side-by-side, using an outer join. The problem is that my computer chokes (MemoryError) when I try to use it on the files I need to join (about two dozen files 6-12 Gb each). I am aware that itertools can be used to make loops more efficient, but I am unclear as to whether or how it could be applied to this situation. The other alternative I can think of is to install mySQL, learn the basics, and do this there. Obviously I'd rather do this in Python if possible because I'm already learning it. An R-based solution would also be acceptable.
Here is my code:
import os
import glob
import pandas as pd
os.chdir("\\path\\containing\\files")
files = glob.glob("*.csv")
sdf = pd.read_csv(files[0], sep=',')
for filename in files[1:]:
df = pd.read_csv(filename, sep=',')
sdf = pd.merge(sdf, df, how='outer', on=['Factor1', 'Factor2'])
Any advice for how to do this with files too big for my computer's memory would be greatly appreciated.