Exclude rows which have NA value for a column

Question

This is a sample of my data

I have written this code which removes all categorical columns (eg. MsZoning). However, some non-categorical columns have NA value. How can I exclude them from my data set.

def main():
    print('Starting program execution')
    iowa_train_prices_file_path='C:\\...\\programs\\python\\kaggle_competition_iowa_house_prices_train.csv'
    iowa_file_data = pd.read_csv(iowa_train_prices_file_path)
    print('Read file')
    
    model_random_forest = RandomForestRegressor(random_state=1)
    features = ['MSSubClass','MSZoning',...]
    y = iowa_file_data.SalePrice
    # every colmn except SalePrice
    X = iowa_file_data.drop('SalePrice', axis = 1)
    #The object dtype indicates a column has text (hint that the column is categorical)
    X_dropped = X.select_dtypes(exclude=['object'])
    print("fitting model")
    model_random_forest.fit(X_dropped, y)

    print("MAE of dropped categorical approach");


pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
main()

When I run the program, I get error ValueError: Input contains NaN, infinity or a value too large for dtype('float32') which I believe is due to NA value of Id=8.

Question 1 - How do I remove such rows entirely Question 2 - What is the type of such columns which are mostly nos. but have text in between? I thought I'll do print("X types",type(X.columns)) but that doesn't give the result

Alex Metsai · Accepted Answer · 2021-03-01T07:03:52.143

3

To remove nans, you can replace them with another value. It is common practice to use zeros.

iowa_file_data = iowa_file_data.fillna(0)

If you still want to remove the whole column, use

iowa_file_data = iowa_file_data.dropna(axis='columns')

And if you want to remove the entire row, use

iowa_file_data = iowa_file_data.dropna()

For your second question, from what I understand, you might want to see some info about the pandas object dtype: link.

edited Mar 01 '21 at 07:03

answered Feb 26 '21 at 07:48

Alex Metsai

1,837
5
12
24

Can I drop the entire row with has NA for a column or dropping the entire column the only option? – Manu Chadha Mar 01 '21 at 06:44
1

to remove rows, skip the axis=.. argument. I edited my post to inlcude this. – Alex Metsai Mar 01 '21 at 07:04

Exclude rows which have NA value for a column

1 Answers1