Just starting on tensorflow
Working on imdb dataset. Process: Text encoding using textvectorization layer and passing it to embedded layer:
# Create a custom standardization function to strip HTML break tags '<br />'.
def custom_standardization(input_data):
lowercase = tf.strings.lower(input_data)
stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')
return tf.strings.regex_replace(stripped_html,
'[%s]' % re.escape(string.punctuation), '')
# Vocabulary size and number of words in a sequence.
vocab_size = 10000
sequence_length = 100
# Use the text vectorization layer to normalize, split, and map strings to
# integers. Note that the layer uses the custom standardization defined above.
# Set maximum_sequence length as all samples are not of the same length.
vectorize_layer = TextVectorization(
standardize=custom_standardization,
max_tokens=vocab_size,
output_mode='int',
output_sequence_length=sequence_length)
# Make a text-only dataset (no labels) and call adapt to build the vocabulary.
text_ds = train_ds.map(lambda x, y: x)
vectorize_layer.adapt(text_ds)
I then try to build a functional API:
embedding_dim=16
text_model_catprocess2 = vectorize_layer
text_model_embedd = tf.keras.layers.Embedding(vocab_size, embedding_dim, name = 'embedding')(text_model_catprocess2)
text_model_embed_proc = tf.keras.layers.Lambda(embedding_mean_standard)(text_model_embedd)
text_model_dense1 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_embed_proc)
text_model_dense2 = tf.keras.layers.Dense(2, activation = 'relu')(text_model_dense1)
text_model_output = tf.keras.layers.Dense(1, activation = 'sigmoid')(text_model_dense2)
However, this is giving the following error:
~\anaconda3\lib\site-packages\keras\backend.py in dtype(x)
1496
1497 """
-> 1498 return x.dtype.base_dtype.name
1499
1500
AttributeError: Exception encountered when calling layer "embedding" (type Embedding).
'str' object has no attribute 'base_dtype'
Call arguments received:
• inputs=<keras.layers.preprocessing.text_vectorization.TextVectorization object at 0x0000029B483AADC0>
Upon making a sequential API like this, it is working fine:
embedding_dim=16
modelcheck = tf.keras.Sequential([
vectorize_layer,
tf.keras.layers.Embedding(vocab_size, embedding_dim, name="embedding"),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(1)
])
I am not sure why this is happening. Is it necessary for the functional API to have an input? Please help!