0

I'm new to the neral network world and made an atempt to write an prediction algoritm with tensorflow/keras. This code is just trying to predict an roc depending on the Alt and Temp based on a graph.

(Not able to show the graph here though.)

After a lot of attempts I got some accuracy, about 0.2 to 0.5. Not great but I at leas got something to work with. After a while it dropped to 0 and however I tweak, it dosn't give me any accuracy at all. Any idead why I won't get any accuracy?

#import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import sklearn.model_selection

#Data collection
factor = 10
data = pd.read_csv("roc_6800_ibf.csv", sep=",")
data = data.apply(pd.to_numeric, errors='coerce')
data = (data / factor) + 5

predict = "Roc"

x = np.array(data.drop([predict], axis=1))
y = np.array(data[predict])

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, 
test_size=0.2)

x_shape = int(x.ndim)
y_shape = int(y.ndim)

#Model

model = keras.Sequential([
keras.layers.Dense(units=(2), input_shape=(2,), activation="relu"),
keras.layers.Dense(4, activation="relu"),
keras.layers.Dense(1, activation="relu")
])

model.compile(optimizer="adam", loss="MeanSquaredError", metrics=["accuracy"])

model.fit(x_train, y_train, epochs=20, batch_size=10, verbose=1)

results = model.evaluate(x_test, y_test)

print("- - - - - - - - - - - - - - - - - - - - - - - -")
print(results)

#Prediction

def dataPredict(inputvalues, outputvalues):
    print("- - - - - - - - - - - - - - - - - - - - - - - -")
    test_q = np.array([inputvalues])
    test_a = outputvalues
    prediction = model.predict((test_q / factor) + 5)

    print("Prediction " + str((prediction[0] - 5) * factor))
    print("Actual " + str(test_a[0]))
    print("Input " + str(test_q))


dataPredict([5.5,20.0],[3.6])
dataPredict([6.8,30.0],[0.4])

My indata is about 80 rows from points that I have taken myself from the graph and looks like this. I want to take Alt and Temp to get Roc.

Updated the dataset, 72 rows:

Alt,Temp,Roc
-1.0,-40.0,9.6
0.0,-40.0,9.6
1.0,-40.0,9.6
2.0,-40.0,9.6
3.0,-40.0,9.6
4.0,-40.0,9.6
5.0,-40.0,9.6
6.0,-40.0,9.6
7.0,-40.0,8.1
8.0,-40.0,7.9
7.5,-40.0,9.1
-1.0,0.0,9.6
0.0,0.0,9.6
1.0,0.0,9.6
2.0,0.0,9.6
2.1,0.0,9.6
3.0,0.0,9.0
4.0,0.0,8.0
5.0,0.0,6.6
6.0,0.0,5.5
7.0,0.0,4.2
8.0,0.0,3.2
-1.0,20.0,9.6
0.0,20.0,9.6
0.5,20.0,9.0
1.0,20.0,8.6
2.0,20.0,7.8
3.0,20.0,6.2
4.0,20.0,5.2
5.0,20.0,4.0
6.0,20.0,2.9
7.0,20.0,1.8
8.0,20.0,0.5
-1.0,40.0,7.5
0.0,40.0,6.8
1.0,40.0,5.6
2.0,40.0,4.2
3.0,40.0,3.2
4.0,40.0,2.2
5.0,40.0,1.0
-1.0,50.0,5.4
0.0,50.0,4.2
-0.5,-40.0,9.5
0.5,-40.0,9.5
1.5,-40.0,9.5
2.5,-40.0,9.5
3.5,-40.0,9.5
4.5,-40.0,9.5
5.5,-40.0,9.5
6.5,-40.0,9.1
7.5,-40.0,8.1
-0.5,-10.0,9.5
0.5,-10.0,9.5
1.5,-10.0,9.5
2.5,-10.0,9.5
3.5,-10.0,9.5
4.5,-10.0,8.3
5.5,-10.0,7.1
6.5,-10.0,6.0
7.5,-10.0,5.0
-0.5,30.0,8.4
0.5,30.0,7.6
1.5,30.0,6.4
2.5,30.0,5.5
3.5,30.0,4.2
4.5,30.0,3.1
5.5,30.0,1.9
6.5,30.0,0.8
7.5,30.0,-0.5
5.2,10.0,5.3
6.8,10.0,4.0

I have tried to tweak with the dataset (indata) in the code to make all numbers posetive and devided them by 10, then I got the best resault so far but suddenly it just shot down to 0

Epoch 20/20
6/6 [==============================] - 0s 2ms/step - loss: 32.5049 - accuracy: 0.0000e+00
Bengt B
  • 3
  • 2
  • Please provide the dataset link as well. – Gautam Chettiar Sep 23 '22 at 20:16
  • The dataset is the Indata I have 2/3 down my post. Or do you want the whole dataset of 80 lines? – Bengt B Sep 23 '22 at 20:19
  • Till then, depending on the type of data, and the dataset size, carry out appropriate feature extraction, find out which variables are more correlated to the label. Coming to your model, try to make a Dense Net with more number of units as your model size is really small. Also consider adding Dropouts to avoid overfitting as your dataset is small. Find a suitable model compiling optimizer and loss function. It may be the loss function in your case causing the issue. – Gautam Chettiar Sep 23 '22 at 20:20
  • Entire Dataset will help – Gautam Chettiar Sep 23 '22 at 20:21
  • The dataset is updated. 72 rows. Can it be that the dataset is to small? Should I just try to make an larger one? – Bengt B Sep 23 '22 at 20:24
  • Thanks Gautam, I will try mixure with that, – Bengt B Sep 23 '22 at 20:25
  • I couldn't explain further, so I churned up some code below. It was really quick so its not perfect either. – Gautam Chettiar Sep 23 '22 at 20:55
  • You are using accuracy as the metric which is wrong. It is for classification problems. – Frightera Sep 23 '22 at 21:25
  • Yet another question about accuracy in regression (which does not make sense). – Dr. Snoopy Sep 23 '22 at 21:56
  • You can get better results with the network when you: add a `Normalization` layer at the beginning, change the `batch_size` to the length of the training data (56), increase the epoch number (500 or more), and use 'mape' for loss – AndrzejO Sep 24 '22 at 15:22

1 Answers1

0

Alright so I tried implementing some ML on your Dataset (TLDR: XGBoost worked better in this case)

Now that I had a look at the dataset, your accuracy comes 0 as this is a Regression task, and your output is a continuous number, not in the form of [0 or 1]. Hence matching of the predicted output will be almost 0, hence the 0 accuracy. Better way to evaluate these kind of tasks are using different loss functions like MAE, MSE, RMSE, MAPE, and for accuracy you can use R Squared.

Anyway here's the code:

import pandas as pd
import numpy as np
import seaborn as sns
import collections
import xgboost
from sklearn.linear_model import LinearRegression

df = pd.read_csv("sample_data_1.csv") # Your dataset

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df[['Alt','Temp']], df['Roc'], test_size=0.3)

So first I fitted a Linear Model on your data, because the data entries as well as the complexity seemed pretty simple

lin_model = LinearRegression()
lin_model.fit(x_train, y_train)
preds = lin_model.predict(x_test)

from sklearn.metrics import r2_score
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.6826956688194117'

As you can see, the Linear Model got low accuracy, but now its certain that the inputs are related to the outputs in some fashion.

Next I tried a Keras Model similar to yours, The code is below:

import tensorflow as tf
import tensorflow.keras.layers as layers

model = tf.keras.Sequential([
    layers.Dense(1000, activation = 'relu', input_shape = (2, )),
    layers.Dropout(0.2),
    layers.Dense(500, activation = 'relu'),
    layers.Dropout(0.2),
    layers.Dense(1, activation = 'relu')
])

model.compile(optimizer = 'adam', loss = 'mape', metrics=['mape','mae','mse'])
model.fit(x_train, y_train, epochs = 100, batch_size = 16)
model.evaluate(x_test, y_test)
Output: 1/1 [==============================] - 0s 130ms/step - loss: 53.3907 - mape: 53.3907 - mae: 2.6886 - mse: 15.3293

The results here are really poor as the loss is pretty much 50%, but if you see the Mean Average Error, in magnitude its not a lot.

It means that the model could have performed better if it was scaled down using MinMaxScaler() from scikit-learn's preprocessing library. (You can try that)

Finally I implemented an XGBoost model, which performed much better than the rest:

xgb_clf = xgboost.XGBRegressor(
    learning_rate=0.3,
    max_depth=6,
    n_estimators=1000
)
xgb_clf.fit(x_train, y_train)
preds = xgb_clf.predict(x_test)
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.8968514145069562'

Almost 90%. And keeping mind the rudimentary state of the data, and minimal preprocessing, the XGBoost model can have a good increase of 5 to 6% in accuracy if proper processing and augmentation is used.

Cheers!

Gautam Chettiar
  • 449
  • 2
  • 11
  • Brilliant! Thanks for the help. Sorry, probobly should have known, but like I said. I'm brand new to this. But I will learn. Thanks! – Bengt B Sep 23 '22 at 22:13