0

This is a sample of my data

enter image description here

I have written this code which removes all categorical columns (eg. MsZoning). However, some non-categorical columns have NA value. How can I exclude them from my data set.

def main():
    print('Starting program execution')
    iowa_train_prices_file_path='C:\\...\\programs\\python\\kaggle_competition_iowa_house_prices_train.csv'
    iowa_file_data = pd.read_csv(iowa_train_prices_file_path)
    print('Read file')
    
    model_random_forest = RandomForestRegressor(random_state=1)
    features = ['MSSubClass','MSZoning',...]
    y = iowa_file_data.SalePrice
    # every colmn except SalePrice
    X = iowa_file_data.drop('SalePrice', axis = 1)
    #The object dtype indicates a column has text (hint that the column is categorical)
    X_dropped = X.select_dtypes(exclude=['object'])
    print("fitting model")
    model_random_forest.fit(X_dropped, y)

    print("MAE of dropped categorical approach");


pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
main()

When I run the program, I get error ValueError: Input contains NaN, infinity or a value too large for dtype('float32') which I believe is due to NA value of Id=8.

Question 1 - How do I remove such rows entirely Question 2 - What is the type of such columns which are mostly nos. but have text in between? I thought I'll do print("X types",type(X.columns)) but that doesn't give the result

martineau
  • 119,623
  • 25
  • 170
  • 301
Manu Chadha
  • 15,555
  • 19
  • 91
  • 184

1 Answers1

3

To remove nans, you can replace them with another value. It is common practice to use zeros.

iowa_file_data = iowa_file_data.fillna(0)

If you still want to remove the whole column, use

iowa_file_data = iowa_file_data.dropna(axis='columns')

And if you want to remove the entire row, use

iowa_file_data = iowa_file_data.dropna()

For your second question, from what I understand, you might want to see some info about the pandas object dtype: link.

Alex Metsai
  • 1,837
  • 5
  • 12
  • 24