I have a xlsx
file with 11 columns and and 15M rows and 198Mb in size. It's taking forever with pandas to read and work. After reading Stackoverflow answers, I switched to dask
and modin
. However, I',m receiving the following error when using dask
:
df = dd.read_csv('15Lacs.csv', encoding= 'unicode_escape')
c error :out of memory
.
When I use modin['ray']
I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 112514: invalid start byte
Is there a more efficient way to import large xlsx
or csv
files to python on average hardware?