1

I have a dataset (csv) with the format shown bellow:

First column: random integers

Second column: The class of each integer (called bins)

enter image description here

Bins have been made after preprocessing,for exampe integers between 1000 and 1005 belong in bin number 0 , 1006 and 1011 beongs in bin number 1 and go on.

Target column for my neural network is the column of bins (second column).

I use OneHotEncoding for my target column and transform every bin number to a binary vector. I have 3557 different bins (classes).

I trained it and evaluate it with accurancy 99,7% as a result.

import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from keras import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split

df = pd.read_csv("/dbfs/FileStore/tables/export78.csv")

onehotencoder = OneHotEncoder(categorical_features = [1])
data2 = onehotencoder.fit_transform(df).toarray()
dataset = pd.DataFrame(data2)

X= dataset.iloc[:,3557].astype(float)
y= dataset.iloc[:,0:3557].astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


classifier = Sequential()
#First Hidden Layer
classifier.add(Dense(3557, activation='sigmoid', kernel_initializer='random_normal', input_dim=1))
#Second  Hidden Layer
classifier.add(Dense(3557, activation='sigmoid', kernel_initializer='random_normal'))
#Output Layer
classifier.add(Dense(3557, activation='sigmoid', kernel_initializer='random_normal'))

#Compiling the neural network
classifier.compile(optimizer ='adam',loss='binary_crossentropy', metrics=['accuracy'])

#Fitting the data to the training dataset
classifier.fit(X_train,y_train, batch_size=50, epochs=10)

accr = classifier.evaluate(X_test, y_test)
print('Test set\n  Loss: {:0.3f}\n  Accuracy: {:0.3f}'.format(accr[0] ,accr[1]))

classifier.save("model.h67")


data1 = np.array(X_test)
List = [data1]
model = tf.keras.models.load_model("model.h67")
prediction = model.predict([(data1)])
target = (np.argmax(prediction, axis=0))
dataset1 = pd.DataFrame(target)
display(dataset1)

THE PROBLEM:

When I try to predict manually using my model I cant take right results. As prediction input a give a csv with only one column with random integers and I want bins that belong as a result. enter image description here

Community
  • 1
  • 1

2 Answers2

1

Do you get an error message or just wrong predictions? This is not clear from your question.

Try:

prediction = model.predict(data1)

Edit:

I have 3557 different bins (classes).

classifier.compile(optimizer ='adam',loss='binary_crossentropy', metrics=['accuracy'])

Then binary_crossentropy as loss function is not the right choice, try categorical_crossentropy.

Tinu
  • 2,432
  • 2
  • 8
  • 20
  • just wrong predictions......i use X_test as input in order to check the results,just for debugging but all the relusts are wrong.for example 1005 integer should give result bin 1 but prediction give bin 5 with no reason – Spyros Spyropoulos Oct 29 '19 at 14:27
  • as output i have 3557 ouputs that have values 0 or 1.....integer 1000 belongs in bin 0 so in binary vector with length 3557 ,the only 1 appeared on first column and rest of columns have value 0.For this reason i use binary_crossentropy.Is it wrong? – Spyros Spyropoulos Oct 29 '19 at 14:51
  • I see what you mean, but this is a missconception. `binary_crossentropy` is used if you have only two classes i.e. 0/1, whereas `categorical_crossentropy` is used for multiclass scenarios like yours. – Tinu Oct 29 '19 at 14:54
  • It depends on the number of classes, which loss to chose not the way you encode your labels. If you wouldn't encode them one-hot but just pass them as integers, you could use `sparse_categorical_crossentropy`as loss. – Tinu Oct 29 '19 at 14:56
  • i understand,i use one hot so i should use categorical_crossentropy,but do you find any other mistake(ex. "sigmoid", 3557 number of outputs) because i am not sure that anything else works fine – Spyros Spyropoulos Oct 29 '19 at 15:01
1

There are several issues with your code.

To start with:

I trained it and evaluate it with accurancy 99,7% as a result.

This is a known issue (spurious high accuracy) when one erroneously uses binary_crossentropy loss for a multi-class classification problem; see:

Second, you are also erroneously using activation='sigmoid' in your last layer, where it should be activation='softmax'.

Third, get rid of all these activation='sigmoid' in the rest of your layers, and replace them with relu.

Last, you should get rid of all these kernel_initializer='random_normal' statements in your model layers; leave the argument undefined, so that it defaults to the (much superb) glorot_uniform (docs).

All in all, here is how your model should look like:

classifier = Sequential()
classifier.add(Dense(3557, activation='relu', input_dim=1))
classifier.add(Dense(3557, activation='relu'))
classifier.add(Dense(3557, activation='softmax'))

classifier.compile(optimizer ='adam',loss='categorical_crossentropy', metrics=['accuracy'])

That's very general advice, just for starters; a 3557-class problem is not trivial, neither is clear why you have chosen to go with 3 layers, all of them with the same number (3557) of nodes. Experiment with the architecture, keeping in mind the above points...

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • Well, `softmax` is really just rescaling your outputs to valid probabilities, if you use the argmax of your outputs to get a class prediction is doesn't make a difference. – Tinu Oct 29 '19 at 15:16
  • Thank you very much.I will keepin mind your advice.Do you believe one hot encoding the target classes is it a good choice for my problem?? – Spyros Spyropoulos Oct 29 '19 at 15:17
  • @Tinu but not with a `sigmoid` before – desertnaut Oct 29 '19 at 15:17
  • Regarding the hidden layer activation and the kernel initializer, these are just hyperparameters. There is no explicit law how to choose them, the ones you mentioned are just regarded as best-practices. – Tinu Oct 29 '19 at 15:18
  • @Tinu what exactly is your point? it is obvious that we are talking about such best practices here (what else?). Anyone who would like to stick with `sigmoid`, claiming that there is not any "law" against it (!), well, good luck, just don't complain afterwards and wonder [why "*it doesn't learn*"](https://stackoverflow.com/questions/58608113/neural-network-isnt-learning-for-a-first-few-epochs-on-keras#comment103527313_58608113)... – desertnaut Oct 29 '19 at 15:23