I tried importing a csv file of size 4GB using pd.read_csv
but received out of memory error. Then tried with dask.dataframe
, but couldn't convert to pandas dataframe
( same memory error).
import pandas as pd
import dask.dataframe as dd
df = dd.read_csv(#file)
df = df.compute()
Then tried to use the chunksize
parameter, but same memory error:
import pandas as pd
df = pd.read_csv(#file, chunksize=1000000, low_memory=False)
df = pd.concat(df)
Also tried using chunksize
with lists, same error:
import pandas as pd
list = []
for chunk in pd.read_csv(#file, chunksize=1000000, low_memory=False)
list.append(chunk)
df = pd.concat(list)
Attempts:
- Tried with file size 1.5GB - successfully imported
- Tried with file size 4GB - failed (memory error)
- Tried with low
chunksize
(2000 or 50000) - failed (memory error for 4GB file)
Please let me know how to proceed further?
I use python 3.7 and RAM 8GB.
I also tried the Attempt 3 in a server with RAM 128GB, but still
memory error
I cannot assign
dtype
as the csv file to be imported can contain different columns at different time