2

I have the following column in pandas DataFrame:

col1
1.2
1.4
3.1
aa
bb
NaN

I need to calculate the minimum value in the column col1 while ignoring all empty and non-numeric values.

If I do df[col1].min(), it only ignores empty values, but I still get this error:

TypeError: '<=' not supported between instances of 'float' and 'str'
Tatik
  • 1,107
  • 1
  • 9
  • 17
  • This post looks like it might answer your question:[remove non-numeric rows](https://stackoverflow.com/questions/33961028/remove-non-numeric-rows-in-one-column-with-pandas) – JSells Apr 03 '19 at 19:16

2 Answers2

4

Try with pd.to_numeric():

pd.to_numeric(df.col1,errors='coerce').min()
#1.2
#or df.col1.apply(lambda x: pd.to_numeric(x,errors='coerce')).min() <- slow
petezurich
  • 9,280
  • 9
  • 43
  • 57
anky
  • 74,114
  • 11
  • 41
  • 70
1

I think of this as two steps:

  1. Convert all elements in the column to numeric types. NaN is a numeric type, so it's safe to coerce all string values to NaN.
  2. Call min on the resulting (cleaned) column.

To do step one, try testing each element to see if it is an instance of numbers.Number, the base class for all Python numeric types. If it is, return the element. If it isn't, return NaN.

import numbers
import numpy as np

def coerce_to_numeric(value):
    if isinstance(value, numbers.Number):
        return value
    else:
        return np.NaN

# Returns a cleaned version of df[col1]
clean_col = df[col1].apply(coerce_to_numeric)

Then simply add the .min() to get the minimum value of the cleaned column.

clean_col.min()
eswan18
  • 201
  • 1
  • 3