-2

I am working on a dataset where few values in one of the column are string. due to that i am getting error while performing operations on dataset.

sample dataset:-

1.99    LOHARU  0.3 2   0   2   0.3 5   2   0   2   2
1.99    31  0.76    2   0   2   0.76    5   2   7.48    4   2
1.99    4   0.96    2   0   2   0.96    5   2   9.45    4   2
1.99    14  1.26    4   0   2   1.26    5   2   0   2   2
1.99    NUH 0.55    2   0   2   0.55    5   2   0.67    2   2
1.99    99999   0.29    2   0   2   0.29    5   2   0.06    2   2

full dataset can be found here:- https://www.kaggle.com/sid321axn/audit-data?select=trial.csv

I need to found the missing values and outlier in the dataset. below is the code i am using to find missing values:-

#Replacing zeros and 99999 with NAN

dataset[[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]]=dataset[[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]].replace(99999,np.NaN)

#if 12,14 and 17 can have zeroes then
dataset[[0,1,2,3,4,5,6,7,8,9,10,11,13,15,16]]=dataset[[0,1,2,3,4,5,6,7,8,9,10,11,13,15,16]].replace(0,np.NaN)

print(Dataset.isnull().sum())

but this doesn't replace 99999 with NaN

and to find outlier:-

i am calculating zscore

import scipy.stats as stats
array = Dataset.values
Z=stats.zscore(array)

but it gives me below error:- TypeError: unsupported operand type(s) for /: 'str' and 'int'

ashish_goy
  • 31
  • 5
  • 2
    What operations? What are you trying to do? What errors are you getting when you attempt to do so? Are you looking to [Change column type in pandas](https://stackoverflow.com/q/15891038/15497888) `to_numeric` and remove the non-numeric values? – Henry Ecker Sep 27 '21 at 22:02

1 Answers1

0

IIUC, you want to remove the non numeric values. For this you can use pandas.to_numeric with the errors='coerce' option. This will replace non-numeric values with NaNs and enable you to perform numeric operations:

df = df.apply(pd.to_numeric, errors='coerce')

output:

   col1  col2  col3  col4  col5  col6  col7  col8  col9  col10  col11  col12
0  1.99   NaN  0.30     2     0     2  0.30     5     2   0.00      2      2
1  1.99  31.0  0.76     2     0     2  0.76     5     2   7.48      4      2
2  1.99   4.0  0.96     2     0     2  0.96     5     2   9.45      4      2
3  1.99  14.0  1.26     4     0     2  1.26     5     2   0.00      2      2
4  1.99   NaN  0.55     2     0     2  0.55     5     2   0.67      2      2
5  1.99   5.0  0.29     2     0     2  0.29     5     2   0.06      2      2
mozway
  • 194,879
  • 13
  • 39
  • 75