0

I have a data frame of various variables with numeric values (such as temp, speed, etc.) on which I am trying to run a few pieces of code, such as replacing outliers with the mean and creating a scatterplot. However, I keep getting the error I referenced in the title... I'm not sure where I'm going wrong, as this code has worked on other data frames.

Here's a example of my data frame:

import pandas as pd
df = pd.DataFrame({'temp': [.2, naN, .12], 
                   'speed': [1, 1, 0],
                    'weekday': [1, 2, 3]})

Here's the actual code I'm using (step #1 is just importing it and works fine):

import pandas as pd
cars = pd.read_csv("C:/Users/Downloads/file.csv")

Step 2 is where I begin having issues:

import numpy as np

outliers = []
outliers.append(cars[['temp', 'speed']])

for j in outliers:
    upper_quartile = np.nanpercentile(cars[j], 75)
    lower_quartile = np.nanpercentile(cars[j], 25)
    iqr = upper_quartile - lower_quartile

    upper_whisker = upper_quartile + 1.5*iqr
    lower_whisker = np.maximum(lower_quartile - 1.5*iqr, 0)

    cars[j] = np.where((cars[j] <= lower_whisker) | 
                      (cars[j] >= upper_whisker), np.nan, cars[j])

This should be filling outliers with NaN, but when run I get the Boolean data frame error message. Same error message when running this next bit to replace those missing value's with the column's mean:

for v in outliers:
    cars[v].fillna(cars[v].mean(), True)
Jess
  • 1
  • 1
  • 3
    Show your code snippet producing an error – Roman Anderson Sep 25 '19 at 22:14
  • 1
    `cars[v].fillna[cars[v].mean(), TRUE]` ---> `fillna` is a method that takes arguments, not an accessor that takes indices. (`()` vs `[]`) – Paul H Sep 25 '19 at 22:18
  • `cars = pd.read_csv("C:/Users/Downloads/file.csv")` <---- no one but you has this file. what are we supposed to do with this line? See https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Paul H Sep 25 '19 at 22:18
  • I just put that in there, because I'm not sure if there's some way I'm supposed to import it that will prevent this error? – Jess Sep 25 '19 at 22:21
  • Hi Paul I switched to () and I am still getting the error – Jess Sep 25 '19 at 22:23
  • What is `TRUE` by the way? did you define this, or did you mean `True`. – jottbe Sep 26 '19 at 08:02
  • @paul I switched to () and I am still getting the error. I think the issue lies with the Boolean data frame error I'm getting versus the code, as I literally copied/pasted several pieces of code I used successfully with different data frames (that were not Boolean aka they had variables such as height, weight, etc with numbers not T/F or 0/1) but for some reason it's not working on this data frame. I am very new to Python and I tried researching this myself but am just not understanding what is going on/how to fix it – Jess Sep 26 '19 at 15:46
  • @jottbe Sorry it probably should be True, just edited my code (I was using True for the inplace argument). Regardless the main issue I am trying to understand is the Boolean issue, as I've copied/pasted 3 different bits of code I used successfully on other data frames, but for some reason they aren't working with this one. They all seem to have similar structures too i.e. NOT Boolean (like variables such as height, weight, date) so I'm not sure why the error is suddenly popping up here every time – Jess Sep 26 '19 at 15:49
  • See the link I posted earlier? Read through that, build a reproducible example with some sample data embedded, and show us your expected output that you compute by hand – Paul H Sep 26 '19 at 16:32
  • @PaulH I edited to show all my code and an example of the data frame – Jess Sep 26 '19 at 19:10

1 Answers1

0

Not sure if this will solve your issue, but I was running into a similar issue where the ultimate problem ended up being that my specific dataframe of True/False values was being stored as an object dtype within Pandas.

In the error below, you can see where it's checking whether the values are bool types, and although all the values were in fact True/False values, the way in which I constructed the dataframe had the specific column stored as a generic object dtype.

/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py in _setitem_frame(self, key, value)
   3204 
   3205         if key.size and not is_bool_dtype(key.values):
-> 3206             raise TypeError(
   3207                 "Must pass DataFrame or 2-d ndarray with boolean values only"
   3208             )

In the end, the solution was to simply cast that dataframe to bool dtype using:

Before

print(df.values)
array([[False],
       [False],
       [False],
       ...,
       [True],
       [True],
       [True]], dtype=object) # notice the object dtype

After

print(df.astype(bool).values)
array([[False],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
Charles Naccio
  • 328
  • 2
  • 6