I implemented a bidirectional Long Short-Term Memrory Neural Network with a Conditional Random Field Layer (BiLSTM-CRF) using keras
& keras_contrib
(the latter for implementing the CRF, which is not part of native keras functionality
. The task was Named Entity Recognition classification into one of 6 classes. The input to the network is a sequence of 300-dimensional pretrained GloVe word embeddings. This is my model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 648) 0
_________________________________________________________________
embedding_1 (Embedding) (None, 648, 300) 1500000
_________________________________________________________________
bidirectional_1 (Bidirection (None, 648, 10000) 3204000
_________________________________________________________________
crf_1 (CRF) (None, 648, 6) 6054
=================================================================
Now I want to implement the same model in TensorFlow
1.15. Since the keras_contrib CRF module only works in keras but not TensorFlow, I used the CRF implementation built for TensorFlow
1.X from this repo. The repo includes two nice example implementations of the CRF here, but each produces a different error when trained on my data.
Implementation 1
from tensorflow.keras.layers import Bidirectional, Embedding, LSTM, TimeDistributed
from tensorflow.keras.models import Sequential
from tf_crf_layer.layer import CRF
from tf_crf_layer.loss import crf_loss
from tf_crf_layer.metrics import crf_accuracy
MAX_WORDS = 50000
EMBEDDING_LENGTH = 300
MAX_SEQUENCE_LENGTH = 648
HIDDEN_SIZE = 512
model = Sequential()
model.add(Embedding(MAX_WORDS, EMBEDDING_LENGTH, input_length=MAX_SEQUENCE_LENGTH, mask_zero=True, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences=True)))
model.add(CRF(len(labels)))
model.compile('adam', loss=crf_loss, metrics=[crf_accuracy])
This is the error I get when I try to compile the model:
File "/.../tf_crf_layer/metrics/crf_accuracy.py", line 48, in crf_accuracy
crf, idx = y_pred._keras_history[:2]
AttributeError: 'Tensor' object has no attribute '_keras_history'
The error arises when computing crf_accuracy
from the repo mentioned above.
def crf_accuracy(y_true, y_pred):
"""
Get default accuracy based on CRF `test_mode`.
"""
import pdb; pdb.set_trace()
crf, idx = y_pred._keras_history[:2]
if crf.test_mode == 'viterbi':
return crf_viterbi_accuracy(y_true, y_pred)
else:
return crf_marginal_accuracy(y_true, y_pred)
Apparently this kind of error happens when a tensor object is not the output of a keras layer, as per this thread. Why does this error surface here?
Implementation 2
from tf_crf_layer.layer import CRF
from tf_crf_layer.loss import crf_loss, ConditionalRandomFieldLoss
from tf_crf_layer.metrics import crf_accuracy
from tf_crf_layer.metrics.sequence_span_accuracy import SequenceSpanAccuracy
model = Sequential()
model.add(Embedding(MAX_WORDS, EMBEDDING_LENGTH, input_length=MAX_SEQUENCE_LENGTH, mask_zero=True, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(HIDDEN_SIZE, return_sequences=True)))
model.add(CRF(len(labels), name="crf_layer"))
model.summary()
crf_loss_instance = ConditionalRandomFieldLoss()
model.compile(loss={"crf_layer": crf_loss_instance}, optimizer='adam', metrics=[SequenceSpanAccuracy()])
Here the model compiles, but as soon as the first epoch of training begins, this error surfaces:
InvalidArgumentError: Expected begin and size arguments to be 1-D tensors of size 3, but got shapes [2] and [2] instead.
[[{{node loss_4/crf_layer_loss/Slice_1}}]]
I'm training the model using mini batches, could that explain the error? I also noticed that my model summary for the CRF layer lacks a dimension (compare the CRF layer specification in the summary above and in the summary below), although the number of parameters for that layer is the same as above. Why is causing this mismatch and how can it be fixed?
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_5 (Embedding) (None, 648, 300) 1500000
_________________________________________________________________
bidirectional_5 (Bidirection (None, 648, 1000) 3204000
_________________________________________________________________
crf_layer (CRF) (None, 648) 6054
=================================================================