0

Following a paper, I'm using word embeddings as a feature vector for entity recognition.

I've attempted to architect the network using Keras but have run into a dimensionality problem I cannot seem to resolve.

Take the following example sentence:

["I went to the shop"]

The sentence has 5 words, and after computing the feature matrix, I am left with a matrix of dimension: (1, 120, 1000) == (#examples, sequence_length, embedding).

Note that sequence_length appends 0. padding when not complete. In this example, the actual sequence_length would be 5.

My network architecture is as follows:

enc = encode() 
claims_input = Input(shape=(120, 1000), dtype='float32', name='claims')
x = Masking(mask_value=0., input_shape=(120, 1000))(claims_input)
x = Bidirectional(LSTM(units=512, return_sequences=True, recurrent_dropout=0.2, dropout=0.2))(x)
x = Bidirectional(LSTM(units=512, return_sequences=True, recurrent_dropout=0.2, dropout=0.2))(x)
out = TimeDistributed(Dense(8, activation="softmax"))(x)

model = Model(inputs=claims_input, output=out)
model.compile(loss="sparse_categorical_crossentropy", optimizer='adam', metrics=["accuracy"])
model.fit(enc, y)

The architecture is straight forward, I mask specific time steps, run two bidirectional LSTMs, followed by a softmax output. My y variable in this case, is a (9,8) one-hot-encoded matrix corresponding to the gold label of each word.

When trying to fit() this model, I am running into a dimensionality problem relating to the TimeDistributed() layer and I'm unsure how to resolve, or even begin to debug this.

Error: ValueError: Error when checking target: expected time_distributed_1 to have 3 dimensions, but got array with shape (9, 8)

Any help would be appreciated.

today
  • 32,602
  • 8
  • 95
  • 115
methhead
  • 115
  • 2
  • 12

1 Answers1

0

You are doing entity recognition. So each element in your input sequence will be assigned an entity (probably some of them as null). If your model takes an input sample of shape (120, n_features), then the output must also be a sequence of length of 120, i.e. one entity for each element. Therefore, the labels, i.e. y, you provide to the model must have a shape of (n_samples, 120, n_entities) (or (n_samples, 120, 1) if you are using sparse labeling).

Side note: There is no difference between TimeDistributed(Dense(...)) and Dense(...), as the Dense layer is applied on the last axis.

today
  • 32,602
  • 8
  • 95
  • 115