3

df:

Org_Name   Emp_Name   Age  Salary
0  Axempl    Rick       29    1000
1  Lastik    John       34    2000
2  Xenon     sidd       47    9000
3  Foxtrix   Ammy      thirty 2000
4  Hensaui   giny       33    ten  
5  menuia    rony      fifty  7000
6  lopex     nick       23    Ninety

I want loop over Numeric Column (Age, Salary) to check each value whether it is numeric or not, if string value present in Numeric column filter out the record and create a new data frame without that error.

Output :

  Org_Name   Emp_Name   Age  Salary
0  Axempl    Rick       29    1000
1  Lastik    John       34    2000
2  Xenon     sidd       47    9000
ALollz
  • 57,915
  • 7
  • 66
  • 89
harry
  • 165
  • 2
  • 11

3 Answers3

1

You could extend this answer to filter on multiple columns for numerical data types:

import pandas as pd
from io import StringIO

data = """
Org_Name,Emp_Name,Age,Salary
Axempl,Rick,29,1000
Lastik,John,34,2000
Xenon,sidd,47,9000
Foxtrix,Ammy,thirty,2000
Hensaui,giny,33,ten  
menuia,rony,fifty,7000
lopex,nick,23,Ninety
"""

df = pd.read_csv(StringIO(data))
print('Original dataframe\n', df)

df = df[(df.Age.apply(lambda x: x.isnumeric())) &
        (df.Salary.apply(lambda x: x.isnumeric()))]
print('Filtered dataframe\n', df)

gives

Original dataframe
   Org_Name Emp_Name     Age  Salary
0   Axempl     Rick      29    1000
1   Lastik     John      34    2000
2    Xenon     sidd      47    9000
3  Foxtrix     Ammy  thirty    2000
4  Hensaui     giny      33   ten  
5   menuia     rony   fifty    7000
6    lopex     nick      23  Ninety
Filtered dataframe
   Org_Name Emp_Name Age Salary
0   Axempl     Rick  29   1000
1   Lastik     John  34   2000
2    Xenon     sidd  47   9000
maxstrobel
  • 497
  • 4
  • 17
1

I believe this can be solved using Pandas' "to_numeric" function.

import pandas as pd

df['Column to Check'] = pd.to_numeric(df['Column to Check'], downcast='integer', errors='coerce')

df.dropna(axis=0, inplace=True)

Where 'Column to Check' is the column name that your are checking for values that cannot be cast as an integer (or any numeric type); in your question I believe you will want to apply this code to 'Age' and 'Salary'. "to_numeric" will convert any values in those columns to NaN if they could not be cast as your selected type. The "dropna" method will remove all rows that have a NaN in any of your columns.

To loop over the columns like you ask, you could do the following:

for col in ['Age', 'Salary']:
        df[col] = pd.to_numeric(df[col], downcast='integer', errors='coerce')
df.dropna(axis=0, inplace=True)

EDIT: In response to harry's comment. If there are preexisting NaNs in the data, something like the following should keep any valid row that had a preexisting NaN in one of the other columns.

for col in ['Age', 'Salary']:
    df[col] = pd.to_numeric(df[col], downcast='integer', errors='coerce')
    df = df[df[col].notnull()]
Conner
  • 56
  • 3
  • 1
    dropna will also drop actual null values. – harry Nov 20 '19 at 02:51
  • That's a good point. There are other ways to drop the appropriate rows from the DataFrame if there are already existing NaNs in the data. – Conner Nov 20 '19 at 03:26
0

You can use a mask to indicate wheter or not there is a string type among the Age and Salary columns:

mask_str = (df[['Age', 'Salary']]
            .applymap(lambda x: str(type(x)))
            .sum(axis=1)
            .str.contains("str"))
df[~mask_str]

This is assuming that the dataframe already contains the proper types. If not, you can convert them using the following:

def convert(val):
    try:
        return int(val)
    except ValueError:
        return val

df = (df.assign(Age=lambda f: f.Age.apply(convert), 
                Salary=lambda f: f.Salary.apply(convert)))
Horace
  • 1,024
  • 7
  • 12