I am trying to merge 2 dataframes and for some reason it blows out of proportion in memory. those 2 dataframes are relatively large but nothing out of the ordinary. I am using a strong machine with 128GB of RAM.
x = b.merge(a,left_on='id1',right_on='id',how='left')
This is the output:
MemoryError: Unable to allocate 1.58 TiB for an array with shape (216639452968,) and data type int64
I am probably doing something wrong here but can't understand why it gets to a 1.6 TB of memory requirment.
Here is some info on the dataframes:
print(a.info())
print(a.memory_usage(deep=True))
print(b.info())
print(b.memory_usage(deep=True))
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10092079 entries, 0 to 10092078
Data columns (total 2 columns):
# Column Dtype
--- ------ -----
0 id object
1 date datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 154.0+ MB
None
Index 128
id 654665935
date 80736632
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15000000 entries, 0 to 14999999
Data columns (total 2 columns):
# Column Dtype
--- ------ -----
0 id1 object
1 id2 object
dtypes: object(2)
memory usage: 228.9+ MB
None
Index 128
id1 965676606
id2 718661312
dtype: int64