0

I have to train a neural network that predicts the class of an individual. The dataset is about accidents in Barcelona. Due to that, my dataset has both categorical and numerical features. In order to train the neural network I have built a model that contains an embedding layer for every categorical column. How ever, when I try to fit my model the following appears.

      1 m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
----> 2 m.fit(dd_normalized, dummy_y)

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

My code is the following:

dd = pd.read_csv("C:/Users/Hussnain Shafqat/Desktop/Uni/Q8/TFG/Bases de dades/Modified/2021_Accidents_Final.csv")
dd_features = dd.copy()

Y = dd_features.pop('TipoAcc') #my target variable

# Normalization of Numerical variable
dd_normalized = dd_features.copy()
normalize_var_names = ["Long", "Lat", "NLesLeves", "NLesGraves", "NVictimas", "NVehiculos", "ACarne"] 
for name, column in dd_features.items():
    if name in normalize_var_names:
        print(f"Normalizando {name}")
        dd_normalized[name] = (dd_features[name] - dd_features[name].min()) / (dd_features[name].max() - dd_features[name].min())

dd_normalized = dd_normalized.replace({'VictMortales': {'Si': 1, 'No': 0}})  

#Neural network model creation
def get_model(df):
    names = df.columns
    inputs = []
    outputs = []
    for col in names:
        if col in normalize_var_names:
            inp = layers.Input(shape=(1,), name = col)
            inputs.append(inp)
            outputs.append(inp)
        else:
            num_unique_vals = int(df[col].nunique())
            embedding_size = int(min(np.ceil(num_unique_vals/2), 600))
            inp = layers.Input(shape=(1,), name = col)
            out = layers.Embedding(num_unique_vals + 1, embedding_size, name = col+"_emb")(inp)
            out = layers.Reshape(target_shape = (embedding_size,))(out)
            inputs.append(inp)
            outputs.append(out)
    x = layers.Concatenate()(outputs)
    x = layers.Flatten()(x)
    x = layers.Dense(64, activation ='relu')(x)
    y = layers.Dense(15, activation = 'softmax')(x)
    model = Model(inputs=inputs, outputs = y)
    return model

m = get_model(dd_normalized)

#I convert the target variable using one hot encoding
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
dummy_y = np_utils.to_categorical(encoded_Y)

#Model training
m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
m.fit(dd_normalized, dummy_y)

I have tried to convert my dataset into a tensor using tf.convert_to_tensor but the same error appears. After some research, I have find out that the same errors appears when I try to convert to tensor with both categorical and numerial columns. If I apply the function to just categoricals or numericals columns it works fine. i know that I can't feed categorical data to neural network, however, I think with the embedding layers should be enough to solve the problem.

I also have tried this solution, but it doesn't work. Any idea what it can be?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • I just noticed that all the inputs of my model are float32. And I'm trying to give him objects in the categorical variables. Is there any way to define input for embedding layers? – Hasnain Shafqat Apr 30 '22 at 23:10
  • Could you provide some sample data? – elbe May 01 '22 at 09:58
  • Could you also display the error message using run_eagerly=True in the model compilation. – elbe May 01 '22 at 10:03

1 Answers1

0

You can do both category and numerical by conversion of the string into numbers or columns number ( see decoder ) then it is just simply network training.

[ Sample ]:

import tensorflow as tf
import pandas as pd

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
text = "I love cats"
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=10000, oov_token='<oov>')
tokenizer.fit_on_texts([text])

vocab = [ "a", "b", "c", "d", "e", "f", "g", "h", "I", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "_" ]
data = tf.constant([["_", "_", "_", "I"], ["l", "o", "v", "e"], ["c", "a", "t", "s"]])

layer = tf.keras.layers.StringLookup(vocabulary=vocab)
sequences_mapping_string = layer(data)
sequences_mapping_string = tf.constant( sequences_mapping_string, shape=(1,12) )
print( 'result: ' + str( sequences_mapping_string ) )

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Dataset
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
variables = pd.read_excel('F:\\temp\\20220305\\Book 2.xlsx', index_col=None, header=None)

print(variables)
print(tf.constant(variables).shape)

list_of_X = [ ]
list_of_Y = [ ]

for i in range(tf.constant(variables).numpy().shape[0]):
    for j in range(tf.constant(variables).numpy().shape[1]):
        if variables[j][i] == "X" :
            print('found: ' + str(i) + ":" + str(j))
            list_of_X.append(i)
            list_of_Y.append(1)
        else :
            list_of_X.append(i)
            list_of_Y.append(0)

for i in range( sequences_mapping_string.numpy()[0].shape[0] ):
    list_of_X.append( sequences_mapping_string.numpy()[0][i] )
    list_of_Y.append( sequences_mapping_string.numpy()[0][i] )

list_of_X = tf.cast( list_of_X, dtype=tf.int32 )
list_of_X = tf.constant( list_of_X, shape=( 1, 48, 1) )
list_of_Y = tf.cast( list_of_Y, dtype=tf.int32 )
list_of_Y = tf.constant( list_of_Y, shape=( 1, 48, 1) )

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(48, 1)),
    
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128, return_sequences=True, return_state=False)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128)),
    tf.keras.layers.Dense(1 , activation='sigmoid' ),
])

model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(loss = 'mean_squared_error',
              optimizer = 'adam',
              metrics = ['mean_squared_error'])
              
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Training
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""             
history = model.fit(list_of_X, list_of_Y, epochs=10, batch_size=4)

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Predict
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
result = model.predict( tf.zeros([1, 48, 1]).numpy() )
print( 'result: ' + str(result) )
General Grievance
  • 4,555
  • 31
  • 31
  • 45