1

After training my model, the test accuracy is always 50%. What is wrong in my code below?

0~4000 normal signal data , 4001~6000 abnormal signal data for binary classification. data dimension is (6000, 8000)

data = np.load('data.npy') 
label = []
for i in range(len(data)): ## labeling
    if i < 4000:
        label.append(1)
    else:
        label.append(0)

label = np.array(label)

## each 100 data was extracted for test
test_data =  np.concatenate((data[:100], data[4001:4101]), axis=0)  
test_label = np.concatenate((label[:100], label[4001:4101]), axis=0)
train_data = np.concatenate((data[100:4001], data[4101:]))
train_label = np.concatenate((label[100:4001], label[4101:]))

## data shuffleing
tmp = [[x,y]for x, y in zip(train_data, train_label)]
tmp1 = [[x,y]for x, y in zip(test_data, test_label)]
random.shuffle(tmp)
random.shuffle(tmp1) 
train_data = [n[0] for n in tmp]
train_label = [n[1] for n in tmp]
train_data = np.array(train_data)
train_label = np.array(train_label)
teet_data = [n[0] for n in tmp1]
test_label = [n[1] for n in tmp1]
test_data = np.array(test_data)
test_label = np.array(test_label)

## scaling
mean = train_data.mean(axis=0)
std = train_data.std(axis=0)

train_data -= mean
train_data /= std
test_data -= mean
test_data /= std

model = models.Sequential()
model.add(layers.Dense(128, activation='relu', input_shape=(8000,)))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='Adam',
             loss='binary_crossentropy',
             metrics=['acc'])

history = model.fit(train_data,
                    train_label,
                    epochs=60,
                    batch_size=128,
                    shuffle=True,
                    validation_split=0.2)

loss curve

enter image description here

loss, acc = model.evaluate(test_data, test_label)

200/200 [==============================] - 0s 140us/step

print(acc)

0.5

hoan
  • 13
  • 5

3 Answers3

0

It seems very likely that your model is predicting only one class for your test data.

It may be caused by your feature scaling approach. You should standardize your test data from the statistics extracted from the training set.

tlitfin
  • 96
  • 5
  • mean = train_data.mean(axis=0) std = train_data.std(axis=0) train_data -= mean train_data /= std test_data -= mean test_data /= std I change my code like this, but noting better.. accuracy is always 50% – hoan Apr 21 '20 at 01:22
  • Can you check to make sure that your input features are different for all of the test samples? – tlitfin Apr 21 '20 at 03:06
  • yes, input features are different for all of the test samples. – hoan Apr 21 '20 at 06:07
0

your model is too weak/small for that amount of features. just in your first layer, you are destroying all the information by converting 8000 features into 8!! use more units, a lot more than that and let it learn something instead of destroying your dataset. your model is not able to predict any better than random right now.

leo
  • 802
  • 6
  • 15
  • I have changed my code little blt. but the model still predict only one value. I guess the model has something wrong with the data scaling.... – hoan Apr 21 '20 at 03:46
  • then you better check your data, maybe it's a meaningless data, what is this data about with 8000 features?! – leo Apr 21 '20 at 08:43
  • The data is artificially generated signal. You mean Do I need to use feature extraction Technic? – hoan Apr 21 '20 at 09:52
  • your data is not meaningful, if there is some distinctive patterns in your signals, your arbitrary labeling which you are doing at the beginning is wrong and meaningless. it's like having pictures of cats and dogs, but assign labels of 1 to both cat images and dog images, and also assigning labels of 0 to both cats and dogs images. how on earth should the model learn any distinction between those? so the problem is your data, or more specifically your labels. you can't just randomly assign labels to different entities and expect to produce something meaningful. – leo Apr 21 '20 at 11:16
  • I think my signal data have different pattern. see the below code. – hoan Apr 22 '20 at 00:42
0

Here is my signal data.

import sounddevice as sd
import numpy as np
from math import pi

fs = 4000

n = np.arange(0, 2, 1/fs)

f = 13000 # x 
f1 = 1310 # x1
f2 = 175  # x 2
f3 = 45 # x3
'''
(8000,)
'''

x = np.sin(2*pi*f*n)
x1 = np.sin(2*pi*f1*n)
x2 = np.sin(2*pi*f2*n)
y = np.random.rand(len(x))
x3 = np.sin(2*pi*f3*n)
y = np.random.rand(len(x))

fault =  y*0.2 + (x1+x2 + x3) + 0.15
normal =  y*0.2 +(x1 + x2) +2

y = np.random.rand(len(x))
normal = normal
normal.shape
fault.shape

(8000,)

normal_data=[]
for i in range (4000):
    y = np.random.rand(len(x))
    normal = 2*y*(x1 + x2)
    normal_data.append(normal)

normal_data = np.array(normal_data)
normal_data.shape

(4000, 8000)

fault_data=[]
for i in range (2000):
    y = np.random.rand(len(x))
    fault = 2*y*(x1 + x2)
    fault_data.append(fault)

fault_data = np.array(fault_data)
fault_data.shape

(2000, 8000)

## Final signal data
data = np.concatenate((normal_data, fault_data))
data.shape

(6000, 8000)

hoan
  • 13
  • 5