I am using pandas and numpy.
I want to remove every column in my 9000 x 13 training data frame where at least 20% of the entries take the value -200. In this case, -200 is like a missing value or NaN, so I am removing variables that aren't useful. I have a sample of the data below. Any help would be appreciated.
This is some kind of attempt:
train_mod = train.loc[:, train.isnull().mean() <.2]
A B C D E F \
5723 0.5 846.25 -200 2.619270 627.50 79.0
4014 1.5 1016.25 -200 6.810175 848.50 99.0
4074 2.0 -200.00 -200 -200.000 -200.00 114.0
4577 1.6 950.50 -200 8.649763 925.50 351.0
6691 4.7 1469.75 -200 25.820425 1449.75 677.0
2889 0.5 902.50 -200 2.676091 631.25 -200.0
4387 2.0 1095.75 -200 12.972673 1082.75 310.0
4289 1.0 885.50 -200 2.695146 632.50 -200.0
2887 2.3 1355.00 -200 16.611225 1198.25 129.0
5694 1.1 936.25 -200 6.821513 849.00 127.0