1

I would like to return a boolean series based on multiple conditions and then subset that in the initial dataframe.

This is returning a dataframe type rather than a boolean series.

#remove outliers
minInCollection = myDataFrame[  
    (myDataFrame.Age>myDataFrame.Age.min()) & 
    (myDataFrame.Age<myDataFrame.Age.max()) &
    (myDataFrame.Paid_Off_In_Days>myDataFrame.Paid_Off_In_Days.min()) &
    (myDataFrame.Paid_Off_In_Days<myDataFrame.Paid_Off_In_Days.max())
    ]

print("type is ", str(type(minInCollection)))
  <class 'pandas.core.frame.DataFrame'>
bibscy
  • 2,598
  • 4
  • 34
  • 82

1 Answers1

1

You are very close--if you want the Boolean series returned, you need to drop the brackets. See the example code below and this simple tutorial here.

### Make up data
colA = [20,30,17,30,22,27,30,24]
myDataFrame = pd.DataFrame(list(zip(colA)), columns =['Age']) 

minInCollection = (myDataFrame.Age>myDataFrame.Age.min()) & (myDataFrame.Age<myDataFrame.Age.max())
display(minInCollection)
print(type(minInCollection))

enter image description here

a11
  • 3,122
  • 4
  • 27
  • 66
  • one unrelated question, I need to find out if: ```Age```, ```Level of Education(high school, Master, Bachelor)```, ```Amount Borrowed``` influences the ```Loan_Paid_Off_In_X_Days```. How can I visualize this? I have all data in a dataframe. I understand that scatter plot is used only for 2 variables. How should I go about representing the above visually? – bibscy Jul 14 '20 at 09:48