I have a data frame of various variables with numeric values (such as temp, speed, etc.) on which I am trying to run a few pieces of code, such as replacing outliers with the mean and creating a scatterplot. However, I keep getting the error I referenced in the title... I'm not sure where I'm going wrong, as this code has worked on other data frames.
Here's a example of my data frame:
import pandas as pd
df = pd.DataFrame({'temp': [.2, naN, .12],
'speed': [1, 1, 0],
'weekday': [1, 2, 3]})
Here's the actual code I'm using (step #1 is just importing it and works fine):
import pandas as pd
cars = pd.read_csv("C:/Users/Downloads/file.csv")
Step 2 is where I begin having issues:
import numpy as np
outliers = []
outliers.append(cars[['temp', 'speed']])
for j in outliers:
upper_quartile = np.nanpercentile(cars[j], 75)
lower_quartile = np.nanpercentile(cars[j], 25)
iqr = upper_quartile - lower_quartile
upper_whisker = upper_quartile + 1.5*iqr
lower_whisker = np.maximum(lower_quartile - 1.5*iqr, 0)
cars[j] = np.where((cars[j] <= lower_whisker) |
(cars[j] >= upper_whisker), np.nan, cars[j])
This should be filling outliers with NaN, but when run I get the Boolean data frame error message. Same error message when running this next bit to replace those missing value's with the column's mean:
for v in outliers:
cars[v].fillna(cars[v].mean(), True)