4

I'm trying to read a large and complex CSV file with pandas.read_csv. The exact command is

pd.read_csv(filename, quotechar='"', low_memory=True, dtype=data_types, usecols= columns, true_values=['T'], false_values=['F'])

I am pretty sure that the data types are correct. I can read the first 16 million lines (setting nrows=16000000) without problems but somewhere after this I get the following error

ValueError: could not convert string to float: '1,123'

As it seems, for some reason pandas thinks two columns would be one.

What could be the problem? How can I fix it?

SpiderOtto
  • 99
  • 2
  • 5
  • Is it missing an expected delimiter in that row of data? – Dan Dec 16 '15 at 18:03
  • 2
    Have you done a visual inspection of the line at which the error is raised? Alternatively, could you provide us with that line +/- 1 line (so three lines in total)? – Nelewout Dec 16 '15 at 18:04
  • if loosing some data is not an issue you could probably add 'error_bad_lines=False' in order to skip problematic rows – Ezer K Dec 16 '15 at 18:12
  • I think it is very hard without checking problematic rows. But you can check divide by zero - string like `something/0` - it can cause this error. – jezrael Dec 16 '15 at 20:39
  • 1
    How can I find the row? The error message does not say the row. – SpiderOtto Dec 16 '15 at 20:55

1 Answers1

1

I found the mistake. The problem was a thousand separator.

When writing the CSV file, most numbers were below thousand and were correctly written to the CSV file. However, this one value was greater than thousand and it was written as "1,123" which pandas did not recognize as a number but as a string.

SpiderOtto
  • 99
  • 2
  • 5