0

I am learning python deep learning tools on Tensorflow official websites.

Trying to build several Text-Classification network, do as tutorials. But LSTM does not work as except.

import numpy as np
import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow.keras import utils
from tensorflow.keras import losses
import matplotlib.pyplot as plt


seed = 42
BATCH_SIZE = 64

train_ds = utils.text_dataset_from_directory(
    'stack_overflow_16k/train',
    validation_split=0.2,
    subset='training',
    batch_size=BATCH_SIZE,
    seed=seed)
val_ds = utils.text_dataset_from_directory(
    'stack_overflow_16k/train',
    validation_split=0.2,
    subset='validation',
    batch_size=BATCH_SIZE,
    seed=seed)
test_ds = utils.text_dataset_from_directory(
    'stack_overflow_16k/test',
    batch_size=BATCH_SIZE)

class_names = train_ds.class_names
train_ds = train_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
val_ds = val_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
test_ds = test_ds.prefetch(buffer_size=tf.data.AUTOTUNE)


VOCAB_SIZE = 1000
MAX_SEQUENCE_LENGTH = 500

encoder = tf.keras.layers.TextVectorization(
    max_tokens=VOCAB_SIZE,
    output_sequence_length=MAX_SEQUENCE_LENGTH)

encoder.adapt(train_ds.map(lambda text, label: text))


model = tf.keras.Sequential([
    encoder,
    tf.keras.layers.Embedding(VOCAB_SIZE, 64, mask_zero=True),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
#     tf.keras.layers.LSTM(128),
#     tf.keras.layers.Dense(64, activation='relu'),
#     tf.keras.layers.Conv1D(64, 5, padding="valid", activation="relu", strides=2),
#     tf.keras.layers.GlobalMaxPooling1D(),
#     tf.keras.layers.GRU(64),
#     tf.keras.layers.SimpleRNN(64),
#     tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(4)
])
model.summary()


model.compile(loss='sparse_categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

history = model.fit(train_ds, epochs=10,
                    validation_data=val_ds)

This is my complete code, the core part is same as tutorials.

But the training output as follow:

Epoch 1/10
100/100 [==============================] - 33s 273ms/step - loss: 9.6882 - accuracy: 0.2562 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 2/10
100/100 [==============================] - 25s 250ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 3/10
100/100 [==============================] - 25s 252ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 4/10
100/100 [==============================] - 25s 254ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 5/10
100/100 [==============================] - 25s 255ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 6/10
100/100 [==============================] - 26s 256ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 7/10
100/100 [==============================] - 26s 257ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 8/10
100/100 [==============================] - 26s 258ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 9/10
100/100 [==============================] - 26s 258ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587
Epoch 10/10
100/100 [==============================] - 26s 256ms/step - loss: 12.1238 - accuracy: 0.2478 - val_loss: 11.9475 - val_accuracy: 0.2587

The accuracy does not increase and loss does not decrease at all, like not trained. And the accuracy just same as the reciprocal of the number of classes. (E.g. If it's a binary classification problem then the accuracy would keep aroud 0.5, four classification problem 0.25)

Later I compare with CNN, just change the LSTM layers to CNN layers as tutorials, it works as expect. (Same datasets, same params of model.compile() and model.fit())

I also tried GRU, same problem occurs.

I don't get it.

Am I missing some configuration with RNN-like model? Can somebody help me with this problem? Thanks!

P.S.

I tried config the optimizer(sgd, adam) and learning rate, does not work. It is not like overfitting.

Methods I tried:


Update 2023-01-30

I run the same code on my linux server it work expected. It maybe a bug of tensorflow-macos.


Update 2023-02-01

Tried the official version of tensorflow for macos m1, just

conda install tensorflow

it works. Suppposed to be the problem of tensorflow-macos GPU support. And I try again by using CPU only on tensorflow-macos, it works.

Conclusion:

The RNN-like model have problem on tensorflow-macos with GPU.

0 Answers0