Loading a 5GB csv file using the following code crashes my pc.
import pandas as pd
a = pd.read_csv('file.csv',error_bad_lines=False,encoding='latin1',engine='python')
However, it seems that this code is able to deal with "field larger than field limit" issue as in the log I get messages such as:
Skipping line 1435768: field larger than field limit (131072)
I tried to overcome my pc limitations by using chunks:
a=pd.DataFrame()
for chunk in pd.read_csv('file.csv', error_bad_lines=False,encoding='latin1',engine='python',chunksize=10000):
a = pd.concat([a, chunk], ignore_index=True)
But this code ignores "error_bad_lines=False"
as it stops with the following error message:
Error: field larger than field limit (131072)
I also tried to read only part of the file, but again error lines are not being skipped:
a = pd.read_csv('file.csv', error_bad_lines=False,encoding='latin1',engine='python',skiprows=0, nrows=1000000)
results in :
Error: field larger than field limit (131072)
Any suggestions how to overcome this problem? I'd like to skip these problematic lines.