8

I want to know how to use multilayered bidirectional LSTM in Tensorflow.

I have already implemented the contents of bidirectional LSTM, but I wanna compare this model with the model added multi-layers.

How should I add some code in this part?

x = tf.unstack(tf.transpose(x, perm=[1, 0, 2]))
#print(x[0].get_shape())

# Define lstm cells with tensorflow
# Forward direction cell
lstm_fw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

# Get lstm cell output
try:
    outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                          dtype=tf.float32)
except Exception: # Old TensorFlow version only returns outputs not states
    outputs = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
                                    dtype=tf.float32)

# Linear activation, using rnn inner loop last output
outputs = tf.stack(outputs, axis=1)
outputs = tf.reshape(outputs, (batch_size*n_steps, n_hidden*2))
outputs = tf.matmul(outputs, weights['out']) + biases['out']
outputs = tf.reshape(outputs, (batch_size, n_steps, n_classes))
ElSheikh
  • 321
  • 6
  • 28
Gi Yeon Shin
  • 357
  • 2
  • 7
  • 19

4 Answers4

5

You can use two different approaches to apply multilayer bilstm model:

1) use out of previous bilstm layer as input to the next bilstm. In the beginning you should create the arrays with forward and backward cells of length num_layers. And

for n in range(num_layers):
        cell_fw = cell_forw[n]
        cell_bw = cell_back[n]

        state_fw = cell_fw.zero_state(batch_size, tf.float32)
        state_bw = cell_bw.zero_state(batch_size, tf.float32)

        (output_fw, output_bw), last_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, output,
                                                                             initial_state_fw=state_fw,
                                                                             initial_state_bw=state_bw,
                                                                             scope='BLSTM_'+ str(n),
                                                                             dtype=tf.float32)

        output = tf.concat([output_fw, output_bw], axis=2)

2) Also worth a look at another approach stacked bilstm.

  • 2
    I tried this and got this error: ValueError: Variable bidirectional_rnn/fw/lstm_cell/kernel already exists, disallowed. Did you mean to set reuse=True in VarScope? Can you provide a working example? – Rahul Dec 12 '17 at 01:53
5

This is primarily same as the first answer but with a little variation of usage of scope name and with added dropout wrappers. It also takes care of the error the first answer gives about variable scope.

def bidirectional_lstm(input_data, num_layers, rnn_size, keep_prob):

    output = input_data
    for layer in range(num_layers):
        with tf.variable_scope('encoder_{}'.format(layer),reuse=tf.AUTO_REUSE):

            # By giving a different variable scope to each layer, I've ensured that
            # the weights are not shared among the layers. If you want to share the
            # weights, you can do that by giving variable_scope as "encoder" but do
            # make sure first that reuse is set to tf.AUTO_REUSE

            cell_fw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
            cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, input_keep_prob = keep_prob)

            cell_bw = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.truncated_normal_initializer(-0.1, 0.1, seed=2))
            cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, input_keep_prob = keep_prob)

            outputs, states = tf.nn.bidirectional_dynamic_rnn(cell_fw, 
                                                              cell_bw, 
                                                              output,
                                                              dtype=tf.float32)

            # Concat the forward and backward outputs
            output = tf.concat(outputs,2)

    return output
betelgeuse
  • 1,136
  • 3
  • 13
  • 25
  • I have a question related to that. I concat the outputs and reshaped it using `output = tf.reshape(tf.concat(output,1), [-1, 2 * rnn_size]) ` and the dimension is now (Batch_size X timesteps, 2*rnn_size). When I pass it through a dense layer by using `logits=tf.matmul(output, weight) + bias`, my dimension becomes (Batch_size X timesteps, num_classes). These are my logits. How can I then find loss by using `tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y))`? cause the shape of Y placeholder is [None, num_classes]. – ARAT Sep 13 '18 at 21:04
  • 1
    You can't directly. You need to eliminate that timestep dimension. Is there any specific reason to use the output of all timesteps? Generally, we take output at the last time step only. You can do this by returning `output = output[:,-1,:]`. Now logits would be `[batch_size,num_classes]` – betelgeuse Sep 14 '18 at 07:08
  • thank you very much for your quick response. To be honest, this is how I learned LSTM. Like in [this example](http://adventuresinmachinelearning.com/recurrent-neural-networks-lstm-tutorial-tensorflow/) they flatten the output and use it to compute logits, not eliminating the timesteps. I am confused a bit now. – ARAT Sep 14 '18 at 12:32
  • 1
    He did that because he's using `tf.contrib.seq2seq.sequence_loss` which expects the time_step dimension. Notice that once logits are calculated, he again reshaped it to original shape. In your case, you want to use `tf.nn.softmax_cross_entropy_with_logits` which won't take that shape. It will require the last time_step only. – betelgeuse Sep 14 '18 at 12:41
  • Oh I understand.so you say, before dense layer and softwax, I should choose the last time steps of data points and go from there? – ARAT Sep 14 '18 at 12:47
  • If you want to use `tf.nn.softmax_cross_entropy_with_logits` then yes. Though in that particular problem, you might want to use `seq2seq` loss. – betelgeuse Sep 14 '18 at 12:57
  • for the one in the link? yeah i understand. – ARAT Sep 14 '18 at 12:58
2

On top of Taras's answer. Here is another example using just 2-layer Bidirectional RNN with GRU cells

    embedding_weights = tf.Variable(tf.random_uniform([vocabulary_size, state_size], -1.0, 1.0))
    embedding_vectors = tf.nn.embedding_lookup(embedding_weights, tokens)

    #First BLSTM
    cell = tf.nn.rnn_cell.GRUCell(state_size)
    cell = tf.nn.rnn_cell.DropoutWrapper(cell, output_keep_prob=1-dropout)
    (forward_output, backward_output), _ = \
        tf.nn.bidirectional_dynamic_rnn(cell, cell, inputs=embedding_vectors,
                                        sequence_length=lengths, dtype=tf.float32,scope='BLSTM_1')
    outputs = tf.concat([forward_output, backward_output], axis=2)

    #Second BLSTM using the output of previous layer as an input.
    cell2 = tf.nn.rnn_cell.GRUCell(state_size)
    cell2 = tf.nn.rnn_cell.DropoutWrapper(cell2, output_keep_prob=1-dropout)
    (forward_output, backward_output), _ = \
        tf.nn.bidirectional_dynamic_rnn(cell2, cell2, inputs=outputs,
                                        sequence_length=lengths, dtype=tf.float32,scope='BLSTM_2')
    outputs = tf.concat([forward_output, backward_output], axis=2)

BTW, don't forget to add different scope name. Hope this help.

2

As @Taras pointed out, you can use:

(1) tf.nn.bidirectional_dynamic_rnn()

(2) tf.contrib.rnn.stack_bidirectional_dynamic_rnn().

All previous answers only capture (1), so I give some details on (2), in particular since it usually outperforms (1). For an intuition about the different connectivities see here.

Let's say you want to create a stack of 3 BLSTM layers, each with 64 nodes:

num_layers = 3
num_nodes = 64


# Define LSTM cells
enc_fw_cells = [LSTMCell(num_nodes)for layer in range(num_layers)]
enc_bw_cells = [LSTMCell(num_nodes) for layer in range(num_layers)]

# Connect LSTM cells bidirectionally and stack
(all_states, fw_state, bw_state) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
        cells_fw=enc_fw_cells, cells_bw=enc_bw_cells, inputs=input_embed, dtype=tf.float32)

# Concatenate results
for k in range(num_layers):
    if k == 0:
        con_c = tf.concat((fw_state[k].c, bw_state[k].c), 1)
        con_h = tf.concat((fw_state[k].h, bw_state[k].h), 1)
    else:
        con_c = tf.concat((con_c, fw_state[k].c, bw_state[k].c), 1)
        con_h = tf.concat((con_h, fw_state[k].h, bw_state[k].h), 1)

output = tf.contrib.rnn.LSTMStateTuple(c=con_c, h=con_h)

In this case, I use the final states of the stacked biRNN rather than the states at all timesteps (saved in all_states), since I was using an encoding decoding scheme, where the above code was only the encoder.

dopexxx
  • 2,298
  • 19
  • 30
  • Thank you for the detailed explanation. Can I ask about the "final states"? When the input sequences have different length, does "final states" have actual final states corresponding to each different length input? or it may include zero paddings? – Jaeyoung Lee Nov 11 '20 at 03:07
  • this code snippet was done for `tf==1.X` and if I remember correctly it can't handle variable length sequences out of the box. I always used zero-padding. Tensorflow 2.X may have a better solution for this though – dopexxx Nov 11 '20 at 08:16