0

I have dataFrame named data2 which consist 583 observation and 11 variables. there are outliers available in data. I want to impute outliers of my 3 variables named a,b and c. All are of int64 type. using IQR and mean imputation technique.I created two variable from my data2 Q1 and Q3.

Q1 = data2[['a','b','c']].quantile(0.25)
Q3 = data2[['a','b','c']].quantile(0.75)
IQR = Q3 - Q1
print (IQR)

Then I've defined two more variables i.e. lower_limit and upper_limit.

lower_limit = Q1 - 1.5 * IQR
upper_limit = Q3 + 1.5 * IQR

Then I find mean values of a, b, and c.

mean_value = data2[['a','b','c']].mean()
print(mean_value)

Then I've Created one Function.

def imputer(value):
if value < lower_limit or value > upper_limit:
    return mean_value
else:
    return value

Now when I want to put values into dataframe using impute function which I have created earlier.

results = data2[['a','b','c']].apply(imputer) #Error Line

It Shows me error saying ValueError : 'Can only compare identically-labeled Series objects.

Anyones help is appreciated.

1 Answers1

0

I tried to do it by changing axis attribute of apply method, I also checked Series' where method but of no help. After all this what I have come up is that don't use your imputer method

for col in ['a', 'b', 'c']:
    data2[col] = data2.apply(lambda row: mean_value[col] if (row[col] < lower_limit[col] and row[col] > upper_limit[col]) else row[col], axis=1)

I know this might be non efficient, but if anyone has an efficient answer or the way you were doing, then that can be great.

GadaaDhaariGeek
  • 971
  • 1
  • 14
  • 33