0

I have 50 target classes of 300 datasets.

This is my sample dataset, with 98 features:

Sample dataset

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
dataset = pd.read_csv(root_path + 'pima-indians-diabetes.data.csv', header=None)

X= dataset.iloc[:,0:8]
y= dataset.iloc[:,8]

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3)

from keras import Sequential
from keras.layers import Dense

classifier = Sequential()
#First Hidden Layer
classifier.add(Dense(units = 10, activation='relu',kernel_initializer='random_normal', input_dim=8))
#Second  Hidden Layer
classifier.add(Dense(units = 10, activation='relu',kernel_initializer='random_normal'))
#Output Layer
classifier.add(Dense(units = 1, activation='sigmoid',kernel_initializer='random_normal'))

#Compiling the neural network
classifier.compile(optimizer ='adam',loss='binary_crossentropy', metrics =['accuracy'])

#Fitting the data to the training dataset
classifier.fit(X_train,y_train, batch_size=2, epochs=10)

I get 19% accuracy here, and I don't know how to optimize my prediction result.

jkdev
  • 11,360
  • 15
  • 54
  • 77

1 Answers1

0

I am considering that you have performed the Dimentionality Reduction technique on your original data having 98 features and therefore you are using an 8-dimensional input feature in your model.

I have a few observations on your implementation:

[As a Classification Problem]

As you have mentioned that your samples belong to 50 diffecent classes, the problem is certainly a multiclass classification problem. So, you need to encode your label first like:

from keras.utils import to_categorical
y = to_categorical(y, num_classes=50, dtype='float32')

In this case, you need to change the number of output node (representing class) and activation function in the final layer as follows:

classifier.add(Dense(units = 50, activation='softmax'))

Furthermore, you have to ue categorical_crossentropy as a loss function while compiling your model.

classifier.compile(optimizer ='adam',loss='categorical_crossentropy', metrics =['accuracy'])

[As a Regression Problem]

You can also consider this problem as a multiple regression problem as the output is within the range of 0 to 50 (continuous) and can keep a single output node in the final layer as you did. But in that case, you should use a linear activation function instead of sigmoid.

So, the final layer should be like:

classifier.add(Dense(units = 1)) # default activation is linear

Additionally, In case of regression problem, mean_squared_error is the most relevant cost function to use (assuming not many outliers in your dataset) and accuracy as a performance metric is irrelevant (rather you may use mean_absolute_error which is analogous to loss). Hence, the second modification is:

classifier.compile(optimizer ='adam',loss='mean_squared_error')
Kaushik Roy
  • 1,627
  • 2
  • 11
  • 13
  • it is pretty cool but I have a new problem when use more largest data – Moch. Chamdani M Sep 24 '19 at 10:22
  • it is the same data but more larger. 41.000 data – Moch. Chamdani M Sep 24 '19 at 10:23
  • Please describe your problem here or in a new thread. Many expert people are there to help. More data is good for better generalization of neural network. Keep in mind that you have to find the optimal hyper-parameter (learning rate, batch size, epochs) and your current batch_size is too small. You also may want to tune your model e.g. no. of hidden layers, hidden units, activation function, kernel initializer, etc, depending on the performance. – Kaushik Roy Sep 24 '19 at 10:50
  • it is actually in the same case, I use ANN for classifying my data. now I have 355 class and each class have 100 data sample on my dataset. I use the same model in this case like my program above. I use sigmoid as activation for my hidden layer and softmax as activation for my output layer. then I get 80% of accuracy. the bad news is my loss also more than 70% – Moch. Chamdani M Sep 24 '19 at 14:47
  • what is the best activation function for my model if I have 355 class with 100 data sample each class? – Moch. Chamdani M Sep 24 '19 at 15:06
  • Well, you may try Swish, ReLU or Leaky ReLU activation function and through empirical evaluation you can find the best activation function. How many epochs you have used for training? Have you tried with different learning rate? Based on what basis you terminate your training and have you checked your train+validation loss/accuracy plot? – Kaushik Roy Sep 24 '19 at 16:02
  • I use 100 epochs, can u please give me some explanation about acc, loss, val_acc, and val_loss. what are they and what is the difference between them? – Moch. Chamdani M Sep 24 '19 at 16:16
  • Please read ["How to interpret “loss” and “accuracy” for a machine learning model"](https://stackoverflow.com/questions/34518656/). You may check out Dr. Andrew Ng's Machine Learning course from Coursera. – Kaushik Roy Sep 25 '19 at 01:09