Regression model output looks inconsistent

Question

Some context about my project: I intend to study various parameters about bullets and how they affect the ballistics coefficient (i.e. bullet performance) of the projectile. I have different parameters, such as weight, caliber, sectional density, etc. I feel that I did this all wrong though; I am just reading through tutorials and applying what I feel could be useful and relevant in my project.

The output of my regression model looks a bit off to me; the trained model continuously outputs 0.0201 as MSE throughout the model.fit() part of my program.

Also, the model.predict(X) seems to have an accuracy of 100%, however, this does not seem right; I borrowed some code from a tutorial describing Keras models to display the model output while displaying the expected output.

This is the program constructing the model and training it

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.utils import shuffle
import tensorflow as tf
from tensorflow.keras.callbacks import TensorBoard

from pandas.plotting import scatter_matrix

import time

name = 'Bullet Database Analysis v2-{}'.format(int(time.time()))

tensorboard = TensorBoard(log_dir='logs/{}'.format(name))

physical_devices = tf.config.list_physical_devices('GPU') 
tf.config.experimental.set_memory_growth(physical_devices[0], True)

df = pd.read_csv('Bullet Optimization\ShootForum Bullet DB_2.csv')

from sklearn.model_selection import train_test_split
from sklearn import preprocessing
dataset = df.values
X = dataset[:,0:12]
X = np.asarray(X).astype(np.float32)

y = dataset[:,13]
y = np.asarray(y).astype(np.float32)

X_train, X_val_and_test, y_train, y_val_and_test = train_test_split(X, y, test_size=0.3, shuffle=True)
X_val, X_test, y_val, y_test = train_test_split(X_val_and_test, y_val_and_test, test_size=0.5)

from keras.models import Sequential
from keras.layers import Dense, BatchNormalization

model = Sequential(
    [
        #2430 is the shape of X_train
        #BatchNormalization(axis=-1, momentum = 0.1),
        Dense(2430, activation='relu'),
        Dense(32, activation='relu'),
        Dense(1),
    ]
)

model.compile(loss='mse', metrics=['mse'])

history = model.fit(X_train, y_train, 
                    batch_size=64, 
                    epochs=20, 
                    validation_data=(X_val, y_val),
                    #callbacks = [tensorboard]
                    )
# plt.plot(history.history['loss'],'r')
# plt.plot(history.history['val_loss'],'m')

plt.plot(history.history['mse'],'b')
plt.show()

model.summary()

model.save("Bullet Optimization\Bullet Database Analysis.h5")

Here is my code, loading my previously trained model via h5

import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras.models import load_model
import pandas as pd

df = pd.read_csv('Bullet Optimization\ShootForum Bullet DB_2.csv')

model = load_model('Bullet Optimization\Bullet Database Analysis.h5')

dataset = df.values
X = dataset[:,0:12]
y = dataset[:,13]

model.fit(X,y, epochs=10)

#predictions = np.argmax(model.predict(X), axis=-1)
predictions = model.predict(X)
# summarize the first 5 cases
for i in range(5):
    print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

This is the output

Epoch 1/10
2021-03-09 10:38:06.372303: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-03-09 10:38:07.747241: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
109/109 [==============================] - 2s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 2/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 3/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 4/10
109/109 [==============================] - 0s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 5/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 6/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 7/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 8/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
Epoch 9/10
109/109 [==============================] - 1s 5ms/step - loss: 0.0201 - mse: 0.0201
Epoch 10/10
109/109 [==============================] - 0s 4ms/step - loss: 0.0201 - mse: 0.0201
[0.314, 7.9756, 100.0, 100.0, 31.4, 0.00314, 318.4713376, 6.480041472000001, 0.51, 12.95400001, 4.067556004, 0.145] => 0 (expected 0)
[0.358, 9.0932, 148.0, 148.0, 52.983999999999995, 0.002418919, 413.4078212, 9.590461379, 0.635, 16.12900002, 5.774182006, 0.165] => 0 (expected 0)
[0.313, 7.9502, 83.0, 83.0, 25.979, 0.003771084, 265.1757188, 5.378434422000001, 0.504, 12.80160001, 4.006900804, 0.121] => 0 (expected 0)
[0.251, 6.3754, 50.0, 50.0, 12.55, 0.00502, 199.20318730000002, 3.2400207360000004, 0.4, 10.16000001, 2.5501600030000002, 0.113] => 0 (expected 0)
[0.251, 6.3754, 50.0, 50.0, 12.55, 0.00502, 199.20318730000002, 3.2400207360000004, 0.41, 10.41400001, 2.613914003, 0.113] => 0 (expected 0)

Here is a link to my training dataset. Within my code, I used train_test_split to create both the test and train dataset.

Lastly, is there a way within Tensorboard to visualize the model fitting with the dataset? I really feel that although my model is training, it is not making any significant fitting even though the MSE error is reduced.

Sorry, I am not quite sure what you mean, but I called ```model.fit(...).history``` and this is the output: `'loss': [0.020089702680706978, 0.020084749907255173, 0.02007758617401123, 0.0200809258967638, 0.020072614774107933, 0.0200900100171566, 0.020088041201233864, 0.02009417489171028, 0.020077534019947052, 0.02009080909192562], 'mse': [0.020089702680706978, 0.020084749907255173, 0.02007758617401123, 0.0200809258967638, 0.020072614774107933, 0.0200900100171566, 0.020088041201233864, 0.02009417489171028, 0.020077534019947052, 0.02009080909192562]} ` — Jerome Ariola, Mar 09 '21 at 07:30
Does this answer your question? [NaN loss when training regression network](https://stackoverflow.com/questions/37232782/nan-loss-when-training-regression-network) — Innat, Mar 09 '21 at 08:39
Thank you, but not really; it did help me when my output used to be NaN, however, now my issue is with actually using the trained model. — Jerome Ariola, Mar 09 '21 at 10:49

Frightera · Accepted Answer · 2021-03-09T09:09:14.037

3

Because you have nan values in your dataset. Before splitting up you can check it with df.isna().sum(). These can have a negative impact on your network. Here I just simply dropped them (df.dropna(inplace = True, axis = 0)) but you can use some imputation techniques to replace them.

Also 2430 neurons can be overkill for this data, start with less neurons.

model = tf.keras.models.Sequential(
    [
        tf.keras.layers.Dense(512, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(1),
    ]
)

Here is the last epoch:

Epoch 20/20
27/27 [==============================] - 0s 8ms/step - loss: 8.2077e-04 - mse: 8.2077e-04 - 
                                         val_loss: 8.5023e-04 - val_mse: 8.5023e-04

While doing regression, calculating accuracy straight forward is not a valid option. You can use model.evaluate(X_test, y_test) or when you get predictions by model.predict, you can use other regression metrics to compute how close your predictions are.

edited Mar 09 '21 at 09:09

answered Mar 09 '21 at 08:30

Frightera

4,773
2
13
28

I did myself a favor by posting all my code; thank you for kindly fixing it up! However, I want to know what the first parameter of the `Dense()` function does, where that number is the number of units. I thought I should use 2430 because the dataset size was around that size. Also, I posted the wrong dataset; https://drive.google.com/file/d/1Usv4aIhb03HJcPdtSX_iypqMnk4XRznO/view?usp=sharing this is where the NaNs have been trimmed out – Jerome Ariola Mar 09 '21 at 10:37
I'd also like to add that there didn't seem to be issues during training, but I instead had issues during testing out the regression when I loaded the model as an h5 file in a separate program. This is the output when training ```Epoch 16/20 31/31 [==============================] - 0s 7ms/step - loss: 0.0282 - mse: 0.0282 - val_loss: 0.0244 - val_mse: 0.0244``` – Jerome Ariola Mar 09 '21 at 10:43
1

1) Number of units is the neuron number in that layer. 2)You need to scale that dataset, as you can see some values are very high compared to other ones. Like *0.003* and *265.17* Also when you load the by *load_model()* you don't need to fit or compile it again. Just simply get the predictions because that model already contains the learnt weights etc. – Frightera Mar 09 '21 at 10:55
Sorry but how do I exactly scale my dataset? I've been doing machine learning for a while, but it's high time I understood these little statistical techniques – Jerome Ariola Mar 10 '21 at 16:46
Have a look at: `sklearn.preprocessing.StandardScaler`. Also please consider accepting the answer if it solved your problem. – Frightera Mar 10 '21 at 17:52

Regression model output looks inconsistent

1 Answers1