I try to merge two large size dataframes.
One dataframe (patent_id) has 5,271,459 of rows and the others have more than 10,000 of columns.
To combine these two big dataframes, I use "merge" and separate right dataframe into chunks. (similar with MemoryError with python/pandas and large left outer joins)
But it still meets a memory error. Is there any space for improvements?
Should I use "concat" rather than "merge"?
Or should I use "csv" rather than "pandas" to manage this issue like (MemoryError with python/pandas and large left outer joins)?
for key in column_name:
print key
newname = '{}_post.csv'.format(key)
patent_rotated_chunks = pd.read_csv(newname, iterator=True, chunksize=10000)
temp = patent_id.copy(deep=True)
for patent_rotated in patent_rotated_chunks:
temp = pd.merge(temp,patent_rotated,on = ["patent_id_0"],how = 'left')
temp.to_csv('{}_sorted.csv'.format(key))
del temp