I want to feed a sparse tensor as input to a simple tensorflow-based recurrent neural network. For that I use the following code:
model = keras.Sequential()
model.add(keras.Input(shape=(1, features_dim), sparse=True))
model.add(keras.layers.RNN(keras.layers.SimpleRNNCell(hidden_layer_dim, activation="sigmoid")))
model.add(keras.layers.Dense(labels_dim, activation='softmax'))
model.compile(optimizer=keras.optimizers.SGD(lr=learning_rate, momentum=0.0, decay=0.0, nesterov=False),
loss='sparse_categorical_crossentropy',
metrics=[keras.metrics.SparseCategoricalAccuracy()])
model.fit(
training_data,
training_labels,
epochs=num_epochs,
batch_size=batch_size,
shuffle=True
)
features_dim
denotes the number features, training_labels
is a NumPy array containing the respective labels and training_data is a sparse tensor created from a sparse matrix in COO-format with the shape ([num_entries, 1, num_features]).
However, when I try to run this I get this error:
TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("cond_1/Identity:0", shape=(None, 3), dtype=int64, device=/job:localhost/replica:0/task:0/device:CPU:0), values=Tensor("cond_1/Identity_1:0", shape=(None,), dtype=float64, device=/job:localhost/replica:0/task:0/device:CPU:0), dense_shape=Tensor("stack:0", shape=(3,), dtype=int64, device=/job:localhost/replica:0/task:0/device:CPU:0)). Consider casting elements to a supported type.
I'm a bit lost about this error. I was under the impression that TensorFlow can handle sparse data without having to convert it. Furthermore, if a conversion is necessary, I'm not sure why it fails. I can run the code above (without sparse=True) just fine, if I convert the sparsetensor manually using tf.sparse.to_dense()
. Thus there does not seem to be anything wrong with how I created the sparsetensor from the coo-matrix (at least not as far as I can tell).
Edit:
I have now tested the answers provided in Using sparse matrices with Keras and Tensorflow extensively. Unfortunately, neither of them solves my problem without sacrificing at least the sparsity of the batch. Is there no other possibility?