1

Let's says there is a pandas data frame as below: {a:[1,2,3,4], b:[1,2,3,?]} Assuming values within the strings 'a' and 'b' are more than a thousand, and we do not know yet there is '?' in the series b. Thus, we are keeping having 'object type' when it comes to 'b'

How can we find out at which row exist non-float(non-integer) value?

Brick
  • 41
  • 1
  • 3

3 Answers3

1

You could use something like this:

import pandas as pd

def make_float(v):
    try:
        return float(v)
    except:
        return pd.np.nan

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [1, 2, 3, '?']})

df_float = df.applymap(make_float)
# or just df_float = df.apply(pd.to_numeric, errors='coerce')

After this, df_float will be of type float and will have NaN values wherever invalid entries occurred. This will convert valid number strings (e.g., '0.7') to floats; you have to decide whether that's a good thing.

You can then find the location of the NaN values (which were formerly non-convertable entries in df) via this code (from https://stackoverflow.com/a/33641639/3830997):

df_nan = df_float.unstack()
df_nan = df_nan[df_nan.isnull()]
df_nan
# b  3    NaN
Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45
1

You can easily using pandas achieve this :

df.apply(pd.to_numeric,errors='coerce').isnull().any()
Out[795]: 
a    False
b     True
dtype: bool

Data Input

df = pd.DataFrame({'a': [1, 2, 3, 4], 'b': [1, 2, 3, '?']})
BENY
  • 317,841
  • 20
  • 164
  • 234
0

Say you have multiple rows in the same column that are not numbers,

df = pd.DataFrame({'a':[1,2,3,4,5,6], 'b':['1','2','3','?', '?', 4]})

You can get the indices of all those non-numbers using,

pd.isnull(pd.to_numeric(df['b'], errors='coerce')).nonzero()[0]

You get

array([3, 4])

If you need to do this over multiple columns like in this df,

df = pd.DataFrame({'a':[1,'?',3,4,5,6], 'b':['1','2','3','?', '?', 4]})

Try

pd.isnull(df.apply(lambda x: pd.to_numeric(x, errors='coerce'))).any(1).nonzero()[0]

And you get

array([1, 3, 4])
Vaishali
  • 37,545
  • 5
  • 58
  • 86