0

So, here is what I am trying to do. My model has to receive a number of training samples, each a conjunction of Boolean literals (i.e. a vector of 0 or 1s) assigned with a truth value. Learning from the samples, it must be able to receive some test vector and determine its truth value.

More concretely, a vector of 0 and 1s such as V = [1,0,0,...,0,1] may be either acceptable or not (labeled with 1 or 0.) My training sample array contains 15202 of such vectors. It is an array of size (15202, 20) and the train label array is of size (15202, 1). Then there is a training label array containing labels for each sample. That is, the following piece of code

print(np.shape(train_samples))
print(type(train_samples))
print(np.shape(train_labels))
print(type(train_labels))

gives the results:

(15202, 20)
<class 'numpy.ndarray'>
(15202, 1)
<class 'numpy.ndarray'>

The rest of the code is as follows:

import numpy as np
from random import randint
from sklearn.utils import shuffle
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
#Main Code
# Randomly generated sample and lables, for illustration only
train_samples = np.random.randint(2, size=(15202,20))
train_labels = np.random.randint(2, size=(15202,1))
#--------

train_labels, train_samples = shuffle(train_labels, train_samples)
scaler = MinMaxScaler(feature_range=(0,1))
scaled_train_samples = scaler.fit_transform(train_samples.reshape(-1,1))
model = Sequential([
    Dense(units=16, input_shape=(1,), activation='relu'),
    Dense(units=32, activation='relu'),
    Dense(units=2, activation='softmax')
])
model.compile(optimizer=Adam(learning_rate=0.0001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x=scaled_train_samples, y=train_labels, batch_size=10, epochs=30, verbose=2)

The final line causes an error:

ValueError: Data cardinality is ambiguous:
  x sizes: 304040
  y sizes: 15202
Please provide data which shares the same first dimension.

I notice that the reported x size (304040) is actually 15202 times 20. So what am I doing wrong here, and how can I fix that? Thanks.

Neutrino
  • 103
  • 2
  • In your own words, when you do `scaler.fit_transform(train_samples.reshape(-1,1))`, what are you expecting the `.reshape` part to accomplish? – Karl Knechtel Nov 18 '20 at 09:45
  • I added it because fit_transform() does not accept 1 dimensional vector. Removing that part, i.e. having only ```scaled_train_samples = scaler.fit_transform(train_samples)``` causes another error: ```Input 0 of layer sequential_8 is incompatible with the layer: expected axis -1 of input shape to have value 1 but received input with shape [None, 20]``` – Neutrino Nov 18 '20 at 09:50

1 Answers1

0

You must have:

scaled_train_samples = scaler.fit_transform(train_samples)

without reshape(-1,1)

Now, the dimension is (15202, 20) and the labels dimension is (15202, 1)

Also, you must change input_shape=(train_samples.shape[1],) instead of input_shape=(1,)

The input_shape must have dimensions (batch_size, input_dim) .

In your case we are using (input_dim, ), hence the nb of dimensions which is 2 (2 columns).

Check here

George
  • 5,808
  • 15
  • 83
  • 160
  • Thanks. Now I am having this error: ValueError: Input 0 of layer sequential_8 is incompatible with the layer: expected axis -1 of input shape to have value 1 but received input with shape [None, 20] – Neutrino Nov 18 '20 at 09:40
  • @Neutrino:Your code runs fine! Are you sure you haven't changed anything else? – George Nov 18 '20 at 09:50
  • I just replaced the original line with ```scaled_train_samples = scaler.fit_transform(train_samples)```. Did it again, same error as reported in the comment above. – Neutrino Nov 18 '20 at 09:54
  • Perfect. Worked like a charm, I guess I see what I was doing wrong. Thanks. – Neutrino Nov 18 '20 at 10:06