5

I am working with a dataframe and I have to convert a column into int type

i use the following notation:

result_df['ftmSectionId'] = result_df['ftmSectionId'].astype('int') 

The DF has several million rows, so apparently there are some values that can not be converted into int (perhaps including commas or periods...) I get the error:

ValueError: invalid literal for int() with base 10: 'not'

Now according to this question: How do I fix invalid literal for int() with base 10 error in pandas

I could use:

data.Population1 = pd.to_numeric(data.Population1, errors="coerce")

Which works.

But in this way I dont know why in the first place I got an error. Due to the nature of the DataBase I am working with I would expect that particular column to have only Integers. How could I query the column to find out which values can not be convert to 'int' with the simple approach .astype('int') ?

Thanks

Other possible answers but not duplicates: Unable to convert pandas dataframe column to int variable type using .astype(int) method This question addresses the same problem, only that they know that the problem is that the column contains NaN and they remove them. I dont know what is the problem here, my goal here is not only convert to 'int' but rather catch the trouble values

JFerro
  • 3,203
  • 7
  • 35
  • 88

2 Answers2

5

You can still use errors="coerce" and then get the values where it is NaN in the original series:

s = pd.Series(["apple", "1.0", "2", -3, "pear", "12,84"])

nans = pd.to_numeric(s, errors="coerce").isna()

Then boolean indexing gives:

>>> s[nans]

0    apple
4     pear
5    12,84
dtype: object
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
1

Here is an additional alternative:

def check_float(value):
try:
    float(value)
    return np.NaN
except ValueError:
    return value

We can call the function: enter link description here

test = pd.Series([42, 3.1415, 'banana'])
test.apply(check_float)

0       NaN
1       NaN
2    banana
dtype: object

But I am not sure if it is scalable.

Here is a post discussing this issue Finding invalid values in numerical columns

Mohammad Fasha
  • 188
  • 1
  • 9