Why does my keras model not train at all?

Question

My code is:

from keras.models import Sequential
from keras.layers import Dense, Dropout, Masking
import numpy as np
import pandas as pd

dataset = pd.read_csv("data/train.csv", header=0)
dataset = dataset.fillna(0)

X = dataset.drop(columns=['YearRemodAdd', "Id", "SalePrice"], axis=1)
Y = dataset[['SalePrice']]

X = pd.get_dummies(X, columns=["MSSubClass", "MSZoning",
                               "Street", "Alley", "LotShape",
                               "LandContour", "Utilities", "LotConfig",
                               "LandSlope", "Neighborhood", "Condition1",
                               "Condition2", "BldgType", "HouseStyle",
                               "YearBuilt", "RoofStyle", "RoofMatl",
                               "Exterior1st", "Exterior2nd", "MasVnrType",
                               "ExterQual", "ExterCond", "Foundation",
                               "BsmtQual", "BsmtCond", "BsmtExposure",
                               "BsmtFinType1", "BsmtFinType2", "Heating",
                               "HeatingQC", "CentralAir", "Electrical",
                               "KitchenQual", "Functional", "FireplaceQu",
                               "GarageType", "GarageFinish", "GarageQual",
                               "GarageCond", "PavedDrive", "PoolQC",
                               "Fence", "MiscFeature", "MoSold",
                               "YrSold", "SaleType", "SaleCondition"])

Ymax = Y['SalePrice'].max()
Y = Y['SalePrice'].apply(lambda x: float(x) / Ymax)

input_units = X.shape[1]
print(X)
print(Y)

model = Sequential()
model.add(Dense(input_units, input_dim=input_units, activation='relu'))
model.add(Dense(input_units, activation='relu'))
model.add(Dense(input_units, activation='relu'))

model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam', metrics=['mse'])
model.fit(X, Y, epochs=250, batch_size=50,
          shuffle=True, validation_split=0.05, verbose=2)

scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

My data is like:

Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,LotConfig,LandSlope,Neighborhood,Condition1,Condition2,BldgType,HouseStyle,OverallQual,OverallCond,YearBuilt,YearRemodAdd,RoofStyle,RoofMatl,Exterior1st,Exterior2nd,MasVnrType,MasVnrArea,ExterQual,ExterCond,Foundation,BsmtQual,BsmtCond,BsmtExposure,BsmtFinType1,BsmtFinSF1,BsmtFinType2,BsmtFinSF2,BsmtUnfSF,TotalBsmtSF,Heating,HeatingQC,CentralAir,Electrical,1stFlrSF,2ndFlrSF,LowQualFinSF,GrLivArea,BsmtFullBath,BsmtHalfBath,FullBath,HalfBath,BedroomAbvGr,KitchenAbvGr,KitchenQual,TotRmsAbvGrd,Functional,Fireplaces,FireplaceQu,GarageType,GarageYrBlt,GarageFinish,GarageCars,GarageArea,GarageQual,GarageCond,PavedDrive,WoodDeckSF,OpenPorchSF,EnclosedPorch,3SsnPorch,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
1,60,RL,65,8450,Pave,NA,Reg,Lvl,AllPub,Inside,Gtl,CollgCr,Norm,Norm,1Fam,2Story,7,5,2003,2003,Gable,CompShg,VinylSd,VinylSd,BrkFace,196,Gd,TA,PConc,Gd,TA,No,GLQ,706,Unf,0,150,856,GasA,Ex,Y,SBrkr,856,854,0,1710,1,0,2,1,3,1,Gd,8,Typ,0,NA,Attchd,2003,RFn,2,548,TA,TA,Y,0,61,0,0,0,0,NA,NA,NA,0,2,2008,WD,Normal,208500
2,20,RL,80,9600,Pave,NA,Reg,Lvl,AllPub,FR2,Gtl,Veenker,Feedr,Norm,1Fam,1Story,6,8,1976,1976,Gable,CompShg,MetalSd,MetalSd,None,0,TA,TA,CBlock,Gd,TA,Gd,ALQ,978,Unf,0,284,1262,GasA,Ex,Y,SBrkr,1262,0,0,1262,0,1,2,0,3,1,TA,6,Typ,1,TA,Attchd,1976,RFn,2,460,TA,TA,Y,298,0,0,0,0,0,NA,NA,NA,0,5,2007,WD,Normal,181500

My results are:

Epoch 123/250
 - 0s - loss: 3.8653 - mean_squared_error: 0.0687 - val_loss: 3.8064 - val_mean_squared_error: 0.0639
Epoch 124/250

It gets stuck there after like 2 epochs. What can I do to prevent it from getting stuck so quickly?

First you should define if this is a regression or a classification problem. Then you take a look at your target variables, which have large values (208500 and 181500), and then look at the sigmoid activation at the output, which means that the neural network will predict values in [0, 1]. There is no way the network can learn to predict values that high with this setup. You need to normalize your targets. — Dr. Snoopy, Dec 31 '18 at 15:46
@MatiasValdenegro I added normalization and edited the code above. Same issue — Shamoon, Jan 01 '19 at 02:21
there are some NoneType values in your dataset. Get rid of them before feeding your data to neural network. — Mitiku, Jan 01 '19 at 06:38

score 1 · Accepted Answer · answered Jan 03 '19 at 15:59

1

It seems you are working on a regression problem (i.e. predicting continuous values). There are, at least, two things you need to consider:

As @Mitiku has mentioned in the comments section there are some NA (i.e. missing) values in the data. This is one of the reasons that makes the loss to become nan. Either drop the rows which have NA values, or alternatively replace NA values with a specific value such as 0. See this answer for more info about dealing with missing data.
Using accuracy as the metric for a regression problem does not make sense as it is only valid for a classification task. Instead use a regression metric such as mse (i.e. mean squared error) or mae (i.e. mean absolute error).

Please apply the two points above in your code, and then report back how the training goes, and I'll update this answer as needed.

answered Jan 03 '19 at 15:59

today

32,602
8
95
115

Updated to adjust to `mse` and replace `NA` values – Shamoon Jan 04 '19 at 03:40
@Shamoon So, by "gets stuck" you mean the accuracy does not improve? Have you tried adding more layers or increasing the number of units in the existing layers? You may also need to add regularization after doing this to prevent overfitting. – today Jan 04 '19 at 17:44
I have added a ton more layers and units. Still no dice. I even added dropout – Shamoon Jan 08 '19 at 20:41
1

@Shamoon There are two points you need to consider: 1) Adding lots of layers does not necessarily leads to a better model. You need to be systematic about the designing your model as well as preparing the data. For example, one good approach is to first try a very basic model, say a simple linear regression model or a NN with just one layer, and see how it would perform. This would serve as the baseline. Then incrementally you try to increase the capacity of your model by adding more layers or trying different architectures and compare all these to the baseline to assess the effect of each >> – today Jan 08 '19 at 21:48
@Shamoon >> change you make. 2) It might be the case you are modeling the problem incorrectly and need to formulate it differently or preprocess and prepare the data in a different format. Also it might be the case that the problem might be inherently difficult to solve in the sense that there might not be even a complex function that perfectly or even fairly maps the inputs to outputs, e.g. stock market prediction. – today Jan 08 '19 at 21:51

Why does my keras model not train at all?

1 Answers1