0

I am working on a regression problem, namely the Boston House prediction problem hosted on Kaggle. I am currently using Random Forest Classifier to reduce the dimensions of my dataset. But right now, I'm getting the following error:

Traceback (most recent call last):
  File "C:/Users/security/Downloads/AP/Boston-Kaggle/Model.py", line 96, in <module>
    print("The selected values from the test set are: " + test[selected])
  File "C:\Users\security\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 2918, in __getitem__
    return self._getitem_bool_array(key)
  File "C:\Users\security\AppData\Roaming\Python\Python37\site-packages\pandas\core\frame.py", line 2963, in _getitem_bool_array
    (len(key), len(self.index)))
ValueError: Item wrong length 303 instead of 1459.

I don't understand why its deliberately asking for 1459 units. This is the chunk of code where the error is coming from:

test = pd.read_csv("https://raw.githubusercontent.com/oo92/Boston-Kaggle/master/test.csv")
# ... a lot of code in between
 sel = SelectFromModel(RandomForestClassifier(n_estimators = 100), threshold = '0.5*mean')
 sel.fit(x_train, y_train)

 selected = sel.get_support()

 print("The selected values from the test set are: " + test[selected])

Update

test.head():

     Id  MSSubClass MSZoning  ...  YrSold  SaleType SaleCondition
0  1461          20       RH  ...    2010        WD        Normal
1  1462          20       RL  ...    2010        WD        Normal
2  1463          60       RL  ...    2010        WD        Normal
3  1464          60       RL  ...    2010        WD        Normal
4  1465         120       RL  ...    2010        WD        Normal

[5 rows x 80 columns]

print(selected):

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True False  True
  True  True  True  True  True  True  True  True  True False  True  True
  True False  True False  True  True False False False False  True  True
  True False  True False  True False  True False False  True  True  True
 False  True  True  True False False False False False False  True  True
  True  True False False  True  True False  True  True  True  True False
  True  True  True False  True False  True  True  True False False False
 False False False False False False False False False False False  True
 False False False  True  True False  True False False  True False  True
 False  True False  True False False False False False False False False
 False False False False False False False  True False  True  True False
 False  True  True False False False False False False  True  True False
  True False  True False False  True  True False False  True  True  True
 False False False  True  True False False  True False  True  True  True
  True False False False  True False  True  True False  True  True False
  True False  True  True  True  True False  True  True  True  True  True
  True False False False False  True  True  True False False False False
 False False False  True False  True False  True False False  True False
 False  True False  True False  True  True False False False False False
 False  True False False  True False  True  True False  True False  True
 False  True False  True  True  True False False False False False  True
 False False False False False  True False  True False  True False False
 False False  True  True  True False  True False False  True False  True
  True False False False False False  True False  True  True False False
 False  True  True]
Onur-Andros Ozbek
  • 2,998
  • 2
  • 29
  • 78

1 Answers1

1

Based on this post: Select from pandas dataframe using boolean series/array

you can adjust: print("The selected values from the test set are: " + test[selected]) to:

s = pd.Series(selected, name='bools')
print("The selected values from the test set are: " + test[s.values])

This will give you the dataframe back, where get_selectedis true. If you want to acces a specific column, you can change it to:

s = pd.Series(selected, name='bools')
testA = test[s.values]
print("The selected values from the test set are: " + testA['columnname'])
PV8
  • 5,799
  • 7
  • 43
  • 87