pandas read_csv() crashing app with segmentation fault when a column value on top row contain e-

Question

Found one defect in pandas which crashes python app immediately without logging any error.(segmentation fault)

OS :Ubuntu 20.04

Python:3.8.5

Pandas:1.2.0 We have csv file having first row like this …

id column,column2, column3,column4,...other columns

1178200e-6546,value,value,value…

And simple code like this…. pd.read_csv(‘filename.csv’)

Reason :

Pandas infer data type by parsing csv data. It ‘assumes’ ‘1178200e-’ as scientific notation of numeric value and tries to convert it to numeric using the remaining part of string. Seems like it fails to parse this value gracefully and crashes without any error. This is what we found by testing various scenarios and yet to look into pandas code.

However, if you move other row as first row, it does not create any issue as first row having proper nonnumeric data makes column data type as ‘object’.

Solution :

1)Either provide data type explicitly 2) Don’t use this version. It works properly with older python version. Need to check most recent version where this functionality works.

This problem occurs only in Ubuntu, tested the same code in Windows and Redhat linux, it is working fine there.

Anybody know how to solve this problem rather than providing data type explicitly.

Did you try pd.read_csv(‘filename.csv’, dtype = {'id' : object})? — Vaishali, Jan 28 '21 at 15:16
that will work, but I have a scenario where I will not know which all columns will be in the file and which all columns may contain such characters, I can't use dtype=str, because it will convert all columns to string , which will again create another problems. — john mathew, Jan 28 '21 at 15:18
You can use dtype = str and explicitly convert rest of the columns back to required type using df['col'].astype(). Here is the [dupe](https://stackoverflow.com/questions/13293810/import-pandas-dataframe-column-as-string-not-int). Anyway reopening the question, incase anyone has a better suggestion. — Vaishali, Jan 28 '21 at 15:21
does the error occur for both `engine='c'` and `engine='python'` ? — Stef, Jan 28 '21 at 15:56
In pandas 1.2.1 with a similar column, I am not able to reproduce this issue. — Trenton McKinney, Jan 28 '21 at 16:17
thanks for the comments guys..@TrentonMcKinney this issue is solved in pandas 1.2.1. but still I don't know why this issue occur in Ubuntu only..this is not reproducible in Windows and Redhat linux. — john mathew, Jan 28 '21 at 16:53

pandas read_csv() crashing app with segmentation fault when a column value on top row contain e-

0 Answers0