0

Loading a 5GB csv file using the following code crashes my pc.

import pandas as pd
a = pd.read_csv('file.csv',error_bad_lines=False,encoding='latin1',engine='python')

However, it seems that this code is able to deal with "field larger than field limit" issue as in the log I get messages such as:

Skipping line 1435768: field larger than field limit (131072)

I tried to overcome my pc limitations by using chunks:

a=pd.DataFrame()
for chunk in pd.read_csv('file.csv', error_bad_lines=False,encoding='latin1',engine='python',chunksize=10000):
    a = pd.concat([a, chunk], ignore_index=True)

But this code ignores "error_bad_lines=False" as it stops with the following error message:

Error: field larger than field limit (131072)

I also tried to read only part of the file, but again error lines are not being skipped:

a = pd.read_csv('file.csv', error_bad_lines=False,encoding='latin1',engine='python',skiprows=0, nrows=1000000)

results in :

Error: field larger than field limit (131072)

Any suggestions how to overcome this problem? I'd like to skip these problematic lines.

Niv
  • 850
  • 1
  • 7
  • 22
  • Not a `machine-learning` question, kindly do not spam irrelevant tags (removed). – desertnaut Dec 05 '20 at 18:00
  • Does this answer your question? [\_csv.Error: field larger than field limit (131072)](https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072) – Gigioz Dec 05 '20 at 18:35
  • Thanks @Gigioz but this doesn't help as they are not using pandas. I'd also like to skip these rows rather than find a way to load them. – Niv Dec 05 '20 at 19:10
  • The duplicate has an answer which discusses how to solve this in Pandas, too. – tripleee Feb 19 '21 at 06:05

0 Answers0