Found one defect in pandas which crashes python app immediately without logging any error.(segmentation fault)
OS :Ubuntu 20.04
Python:3.8.5
Pandas:1.2.0 We have csv file having first row like this …
id column,column2, column3,column4,...other columns
1178200e-6546,value,value,value…
And simple code like this…. pd.read_csv(‘filename.csv’)
Reason :
Pandas infer data type by parsing csv data. It ‘assumes’ ‘1178200e-’ as scientific notation of numeric value and tries to convert it to numeric using the remaining part of string. Seems like it fails to parse this value gracefully and crashes without any error. This is what we found by testing various scenarios and yet to look into pandas code.
However, if you move other row as first row, it does not create any issue as first row having proper nonnumeric data makes column data type as ‘object’.
Solution :
1)Either provide data type explicitly 2) Don’t use this version. It works properly with older python version. Need to check most recent version where this functionality works.
This problem occurs only in Ubuntu, tested the same code in Windows and Redhat linux, it is working fine there.
Anybody know how to solve this problem rather than providing data type explicitly.