Following a paper, I'm using word embeddings as a feature vector for entity recognition.
I've attempted to architect the network using Keras but have run into a dimensionality problem I cannot seem to resolve.
Take the following example sentence:
["I went to the shop"]
The sentence has 5 words, and after computing the feature matrix, I am left with a matrix of dimension: (1, 120, 1000) == (#examples, sequence_length, embedding)
.
Note that sequence_length
appends 0.
padding when not complete. In this example, the actual sequence_length
would be 5.
My network architecture is as follows:
enc = encode()
claims_input = Input(shape=(120, 1000), dtype='float32', name='claims')
x = Masking(mask_value=0., input_shape=(120, 1000))(claims_input)
x = Bidirectional(LSTM(units=512, return_sequences=True, recurrent_dropout=0.2, dropout=0.2))(x)
x = Bidirectional(LSTM(units=512, return_sequences=True, recurrent_dropout=0.2, dropout=0.2))(x)
out = TimeDistributed(Dense(8, activation="softmax"))(x)
model = Model(inputs=claims_input, output=out)
model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=["accuracy"])
model.fit(enc, y)
The architecture is straight forward, I mask specific time steps, run two bidirectional LSTMs, followed by a softmax output. My y
variable in this case, is a (9,8)
one-hot-encoded matrix corresponding to the gold label of each word.
When trying to fit()
this model, I am running into a dimensionality problem relating to the TimeDistributed()
layer and I'm unsure how to resolve, or even begin to debug this.
Error: ValueError: Error when checking target: expected time_distributed_1 to have 3 dimensions, but got array with shape (9, 8)
Any help would be appreciated.