Why can't I filter Pandas dataframe on numeric column

Question

I am using Pandas to analyze data from csv. The dataframe look like this:

    tech_nbr    door_age    service_spend   service_calls
0   2   -7,987  1   3
1   3   -7,987  1   3
2   231561  -7,987  1   3
3   2531885 13  1   3
4   A451349 9   1   3

Now I want to filter out all the rows with negative door_age such as row 0 and 1 using the following command.

df_filtered = df.filter(df.door_age > 0)

However I got error:

TypeError: '>' not supported between instances of 'str' and 'int'

I guess there some values of ages are not numeric, so I added the following line to drop rows with non-numeric door_age based on Remove non-numeric rows in one column with pandas

df[df.door_age.apply(lambda x: x.isnumeric())]

It did seem to remove a lot of rows, but I still got the same error. So I also filtered out rows with null values for door_age using `df = df.dropna(subset=['door_age']). However it did not help either.

Why am I still getting this error?

Can you *explicitly check* the `dtype` of your numeric column before before and after your attempt to remove non-numeric rows? you can use `df.dtypes` or `series.dtype` for this. — jpp, Mar 30 '18 at 21:10
@jpp it is `object` before and after. Should I change the whole column type at the beginning then? — ddd, Mar 30 '18 at 21:17
Yes, use `df[col] = pd.to_numeric(df['col'], errors='coerce')`. Non-numeric values will become `np.nan`. — jpp, Mar 30 '18 at 21:19

score 1 · Accepted Answer · answered Mar 30 '18 at 21:30

1

You need to ensure your series is of numeric type before you attempt any numeric calculations.

In this case, you can coerce non-numeric values to np.nan:

df['door_age'] = pd.to_numeric(df['door_age'], errors='coerce')

answered Mar 30 '18 at 21:30

jpp

159,742
34
281
339

Why can't I filter Pandas dataframe on numeric column

1 Answers1