1

I am working on a sample data set from a link below.

https://www.kaggle.com/enirtium/gender-voice/data

I am trying to open .csv file(maybe I am opening it wrongly) and trying to create fully connected neural layers. Then, I am trying to train them but unfortunately, I am getting input shape not fitting problem.

"ValueError: Error when checking input: expected dense_1_input to have shape (None, 2800) but got array with shape (3168, 1)"

My codes like these:

import csv
import numpy
import string

from keras.models import Sequential
from sklearn.model_selection import train_test_split
import numpy as np

from keras import models
from keras import layers

path = r'/Users/username/Desktop/voice.csv'

meanfreq = []
sd = []
median = []
label = []

with open(path, 'r') as csv_file:
    csv_reader = csv.reader(csv_file)

    next(csv_reader)

    for line in csv_reader:
        #print(line['meanfreq'])
        meanfreq.append(line[0])
        sd.append(line[1])
        median.append(line[2])

        if line[20] == "female":
            label.append(1)
        else:
            label.append(0)   

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(2800,)))
network.add(layers.Dense(1, activation='sigmoid'))

network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

network.fit(meanfreq, label, epochs=5, batch_size=128)
scores = network.evaluate(meanfreq, label)
print("\n%s: %.2f%%" % (network.metrics_names[1], scores[1]*100))

I suppose that maybe, I can't open .csv file (it is opening "list" primitive) or there are any other problems. I am unfortunately fresh man at neural networks and python. I will open this csv file and will use its %70 data to train, %30 data for testing.

2 Answers2

0

Reading in the data seems to be fine.

I Imagine you have a data set that looks like:

mean_freq, label
.12         0
.45         1

And you want to train a classifier. Currently the model is expecting a training example to have 2800 features. input shape=(2800,) but you only want 1 feature: the mean_freq

The mistake here is that you are trying to tell Keras how much training examples to use while declaring the model. You don't do that here, you'll do that later when you're fitting the model.

So the input_shape to keras's Dense Layer should be (1, ) for the single feature. If you're going to use mean and median freq then you would want two features (2, ) and so on.

# note change from 2800 to 1
network.add(layers.Dense(512, activation='relu', input_shape=(1,)))

And you can split your training and test sets in multiple ways. My suggestion is to do something like this:

train_size = 2800
X_train = mean_freq[:train_size]
y_train = label[:train_size]
X_test = mean_freq[train_size:]
y_test = label[:train_size]

Then fit the model with the training set and score with the test set.

network.fit(X_train, y_train, epochs=5, batch_size=128)
scores = network.evaluate(X_test, y_test)

Edit to reflect comments:

well if the case is that you training data has 20 features then you tell keras that with:

# note change from 2800 to 1
network.add(layers.Dense(512, activation='relu', input_shape=(20,)))

You have do the work necessary to get the data in to the shape you need for training and testing but the template above is how you would fit and evaluate the model.

I would also note that there are better ways read in csv data if you're going to do modeling (as you are). Look at using a pandas dataframe. Also better (more standard ways) of creating train and test split: look into sklearn's train_test_split

Edit 2: A quick model of the voice data

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from keras.model import Model
from keras.layers import Dense, Input

# get data ready
data = pd.read_csv('voice.csv')
data.shape
# split out features and label
X = data.iloc[:, :-1].values
y = data.iloc[:, -1]
# map category to binary
y = np.where(y == 'male', 1, 0)
enc = OneHotEncoder()
# reshape y to be column vector
y_ = enc.fit_transform(y.reshape(-1, 1)).toarray()
X_train, X_test, y_train, y_test = train_test_split(
              X, y_,  train_size=0.80, random_state=42)

# model using keras functional style
inp = Input(shape =(20, ))
dense = Dense(128)(inp)
out = Dense(2, activation='sigmoid')(dense)
model = Model(inputs=[inp], outputs=[out])
model.compile(loss='binary_crossentropy', optimizer='adam', 
             metrics=['accuracy'])
model.fit(X_train, y_train, epochs=100, batch_size=128)
model.evaluate(X_test, y_test)
parsethis
  • 7,998
  • 3
  • 29
  • 31
  • My csv file has 21 columns and 3168 rows. First 20 columns are about characteristic data from voice. Last one( 21st Column) is result, it is female or male. I need to use first 20 values and result for learning. –  Mar 29 '18 at 18:24
  • Edit 2 codes give this error: "NameError: name 'Input' is not defined" –  Mar 30 '18 at 00:27
0

Yes,

It works as these;

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

# get data ready
data = pd.read_csv('voice.csv')
data.shape
# split out features and label
X = data.iloc[:, :-1].values
y = data.iloc[:, -1]
# map category to binary
y = np.where(y == 'male', 1, 0)
enc = OneHotEncoder()
# reshape y to be column vector
y_ = enc.fit_transform(y.reshape(-1, 1)).toarray()
X_train, X_test, y_train, y_test = train_test_split(
              X, y_,  train_size=0.80, random_state=42)

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(20,)))
network.add(layers.Dense(2, activation='sigmoid'))

network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])


network.fit(X_train, y_train, epochs=100, batch_size=128)
network.evaluate(X_test, y_test)